Forbes Advisor Logo

Forbes Advisor

Staff Engineer- SRE

Posted 14 Days Ago
Be an Early Applicant
In-Office or Remote
Hiring Remotely in Chennai, Tamil Nadu
Expert/Leader
In-Office or Remote
Hiring Remotely in Chennai, Tamil Nadu
Expert/Leader
The Staff Engineer - SRE is responsible for system reliability, scalability, and performance, collaborating with teams to define SLAs, manage monitoring tools, analyze system health, and maintain documentation. They implement disaster recovery processes and ensure security standards are met while overseeing complexities in cloud deployments and infrastructure issues.
The summary above was generated by AI
Company Description

Forbes Advisor is a new initiative for consumers under the Forbes Marketplace umbrella that provides journalist- and expert-written insights, news and reviews on all things personal finance, health, business, and everyday life decisions.  We do this by providing consumers with the knowledge and research they need to make informed decisions they can feel confident in, so they can get back to doing the things they care about most.

 

If you're looking for challenges and opportunities similar to those of a startup, with the benefits of a seasoned and successful company, then read on:

Job Description

Responsibilities: 

  • The Site Reliability Engineering (SRE) team is responsible for the reliability,   scalability,   stability   and   performance   of   systems   and services. 

  • They work with cross-functional teams to design, build and maintain systems and they troubleshoot issues when they arise. They bridge the gap between development and operations teams. 

  • They   work   closely   with   business   teams   to   define   Service   Level Objectives (SLO) and agreements (SLA) of critical systems. They also monitor and maintain the uptime of these systems in-line with the defined SLO’s and SLA’s.

  • They deploy and manage monitoring tools to gain insights on system health and performance.

  • They   analyze   performance,   identify   bottlenecks   and   implement solutions to improve a system’s scalability and latency durations.

  • They develop scripts, implement tools and automation frameworks to reduce the manual intervention efforts of deployment, monitoring and scaling.  

  • They work with development teams for design and development of observability practices like logging, metrics, tracing, etc. They aim to diagnose and troubleshoot issues proactively.

  • They create actionable alerts on monitoring systems to ensure rapid response for potential production incidents.

  • They forecast resource needs and provision adequately for current and future demand.

  • They design and execute “chaos experiments” to test system’s failure resiliency.

  • They own, define and implement the Disaster Recovery (DR) processes for systems.

  • They also conduct planned and unplanned mock DR drills to test for response preparedness during production incidents.

  • They ensure that security best practices are followed and implemented  during design and operations of systems.

  • They also own and maintain documentation of processes, playbooks, and systems.

  • They publish KPI reports and other system health updates on a regular basis to the business.

 

 

Requirements:

  • Must-have - Bachelor's degree, preferably in CS or a related field, or equivalent experience 

  • Must-have - 12+ years of overall IT experience

  • Must-have - 7+ year of proven work experience as a Senior Site Reliability Engineer or a similar position. 

  • Must-have - 5+ years of AWS Cloud experience with AWS Certified DevOps Engineer or SysOps or Security etc. 

  • Must-have - AWS experience - 3+ years’ experience with using a broadrange of AWS technologies (e.g. EC2, RDS, ELB, S3, VPC, CloudWatch & Monitoring Tools) to develop and maintain an Amazon AWS based cloud solution, with an emphasis on best practice cloud security.

  • Must-have - 2+ year of experience in CDN and/or Cache systems like Fastly, Akamai, CloudFront, etc.

  • Proven Understanding & strong experience with Cloud deployments ( AWS / Docker/ Kubernetes) 

  • Knowledge on provisioning IAC Tools like Terraform, Chef, Ansible, Shell, groovy, python, etc.

  • Experience with monitoring systems such as CloudWatch, NewRelic, Datadog/Splunk, ELK stack. 

  • Experience managing cloud network resources (AWS Preferred) such as CloudWatch,

  • VPC, URL proxies, private link, DNS, ACLs, firewalls, and C2S access points. 

  • Platform or Application Engineering and Operational Knowledge in any of the CI/CD tooling like GitHub Actions, Jenkins, etc.

  • Experience in other tooling Technologies like JIRA, Bitbucket, Jenkins, Fortify, SonarQube, Nexus, Nexus IQ

  • Experience   with   configuration   automation   tools   like Puppet/Ansible/Chef/Salt 

  • Scripting Skills: Strong scripting (e.g. Bash & Python) and automation skills. 

  • Operating Systems: Windows and Linux system administration. 

  • Problem Solving: Ability to analyze and resolve complex infrastructure resource and application deployment issues 

  • Strong attention to detail. Excellent verbal and written communication skills. Strong documentation skills.

Good To Have:

  • Experience with Terraform/Ansible/Chef/Puppet 

  • Experience with GitHub Actions

  • Experience with CloudFront, Fastly

  • Oversees team members performing these functions 

  • Anticipates problems and future technical needs and takes necessary steps to address issues. 

  • Work primarily in server side technologies and comfortable with client side whenever        required 

  • Enthusiastically follow technology trends, software engineering best practices and          technologies

 

Perks:

 

  • Day off on the 3rd Friday of every month (one long weekend each month)

  • Monthly Wellness Reimbursement Program to promote health well-being

  • Paid paternity and maternity leaves

Qualifications

  • Must-have - Bachelor's degree, preferably in CS or a related field, or equivalent experience 

  • Must-have - 12+ years of overall IT experience

  • Must-have - 5+ years of AWS Cloud experience with AWS Certified DevOps Engineer or SysOps or Security etc. 

Top Skills

Ansible
AWS
Bash
Bitbucket
Chef
Cloudwatch
Datadog
Docker
Ec2
Elb
Elk
Fortify
Github Actions
Jenkins
JIRA
Kubernetes
Newrelic
Nexus
Python
Rds
S3
Sonarqube
Terraform
Vpc

Similar Jobs

21 Days Ago
In-Office or Remote
47 Locations
Senior level
Senior level
Artificial Intelligence • Blockchain • Internet of Things • Machine Learning • Software • App development • Automation
As a Staff SRE, you will ensure the reliability, scalability, and performance of systems, lead incident management, and drive automation efforts.
Top Skills: AnsibleAWSAzureBashDockerElk StackGCPGitlab CiGoGrafanaJavaJenkinsKubernetesPrometheusPythonTerraform
7 Hours Ago
Easy Apply
Remote
India
Easy Apply
Senior level
Senior level
Artificial Intelligence • Enterprise Web • Information Technology • Productivity • Sales • Software • Database
The Senior Engineering Sourcing Recruiter will strategically source and engage senior-level technical talent in India, collaborating with hiring managers to support Apollo.io's growth.
Top Skills: Boolean SearchGemGreenhouseLinkedin Recruiter
7 Hours Ago
Easy Apply
Remote
India
Easy Apply
Mid level
Mid level
Artificial Intelligence • Enterprise Web • Information Technology • Productivity • Sales • Software • Database
Lead the Dialer product strategy and roadmap, enhance user engagement, and drive adoption while aligning cross-functional teams for a streamlined experience.
Top Skills: AISaaS

What you need to know about the Chennai Tech Scene

To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account