Forbes Advisor Logo

Forbes Advisor

SRE Manager

Posted Yesterday
Be an Early Applicant
Hybrid
Chennai, Tamil Nadu, IND
Senior level
Hybrid
Chennai, Tamil Nadu, IND
Senior level
Lead the SRE team to manage production support, ensure system reliability, and establish best practices in incident management and observability.
The summary above was generated by AI
Company Description

Forbes Advisor is a new initiative for consumers under the Forbes Marketplace umbrella that provides journalist- and expert-written insights, news and reviews on all things personal finance.

We’re dedicated to helping turn aspirations into reality. We do this by providing consumers with the knowledge and research they need to make informed financial decisions they can feel confident in, so they can get back to doing the things they care about most.

Job Description

WHAT YOU’LL DO: 

  • Lead and manage production & non-production support ensuring high availability and system reliability
  • Drive SRE best practices including incident management, root cause analysis, and continuous improvement Assume ownership of major incidents and drive coordinating efforts to ensure quick resolution of impacting events.
  • Collaborate with SRE team members for design and development of observability practices like Dashboarding, Logging, Metrics, Tracing, etc. They aim to diagnose and troubleshoot issues proactively.
  • Collaborate with SRE team members to define Service Level Objectives (SLO) and agreements (SLA) of critical systems. They also monitor and maintain the uptime of these systems in-line with the defined SLOs and SLAs.
  • Identify and remove blockers, escalate appropriately, and continuous momentum of troubleshooting efforts.
  • Ensure adherence to established incident management processes and protocols.
  • Contribute to the improvement of incident response runbooks and documentation.
  • Own internal and external communications during major incidents.
  • Translate technical details into business-impact language (scope, severity, risk, ETA, confidence level).
  • Maintain clear and continuous communication with stakeholders during incidents, providing timely updates.
  • Ensure safe execution of mitigations, rollbacks, feature flags, and failovers
  • Lead post incident review meetings with stakeholders to confirm event details and assign problem investigators.
  • Track and report on incident metrics, identifying patterns and areas for systemic improvement.
  • Augment Change Managers and / or Problem Managers as required in the performance of those responsibilities.

Qualifications

WHAT YOU’VE DONE: 

  • Bachelor’s or master’s Degree and/or equivalent experience relevant to functional area. 
  • 12+ years of experience in SRE / DevOps
  • 5+ years of working experience as a Site Reliability Engineer
  • Experience managing critical incidents in a 24/7 production environment.
  • Experience with ServiceNow ITSM and on‑call incident coordination via PagerDuty / Zen duty (or comparable ITSM/on‑call tools).

 

Knowledge, Skills, Abilities & Behaviours

  • Understand a wide breadth of technical concepts across SRE practices
  • Background in cloud-based systems and SRE practices is a must.
  • Experience in at-least one Observability platform like New Relic, Datadog, etc. preferred.
  • Ability to use AI tools to synthesize communication, reports, and troubleshooting leads.
  • Certification in AWS, ITIL, or related frameworks preferred.
  • Experience in SaaS or technology product companies preferred.
  • Strong leadership and decision-making skills under pressure.
  • Excellent verbal and written communication skills for both technical and non-technical audiences.
  • Ability to manage multiple priorities and deadlines in high-stakes situations.
  • Strong analytical skills to drive root cause analysis and trend identification.
  • Familiarity with modern monitoring and incident management tools.
  • Demonstrated ability to build consensus across diverse teams.
  • Effective at maintaining calm and focus during critical situations.
  • Knowledge of cloud infrastructure (e.g., AWS, Azure) and application architecture.
  • Proven track record of improving incident management processes.
  • Attention to detail in documentation and follow-through.
  • Adept at facilitating collaboration across remote and global teams.
  • Proactive in identifying operational risks and implementing preventive measures.
  • Committed to continuous learning and process improvement.
  • Ethical, dependable, and resilient in challenging scenarios.

Additional Information

● Day off on the 3rd Friday of every month (one long weekend each month)
● Monthly Wellness Reimbursement Program to promote health well-being
● Monthly Office Commutation Reimbursement Program
● Paid paternity and maternity leaves

Similar Jobs

14 Hours Ago
Hybrid
Chennai, Tamil Nadu, IND
Senior level
Senior level
Big Data • Fintech • Information Technology • Business Intelligence • Financial Services • Cybersecurity • Big Data Analytics
Manage software and platform releases across the UK, ensuring operational integrity and stakeholder alignment. Oversee deployment through CI/CD pipelines, communication with teams, incident management, risk compliance, and continuous improvement processes.
Top Skills: CheckmarxoneCloudflareDataprocGCPGkeGrafanaHarnessHashicorp VaultHelmKafkaKeycloakKongKubernetesOpentelemetryPingPostgresPrometheusRedisTerraformWiz
14 Hours Ago
Hybrid
Chennai, Tamil Nadu, IND
Senior level
Senior level
Big Data • Fintech • Information Technology • Business Intelligence • Financial Services • Cybersecurity • Big Data Analytics
Manage software and platform releases, ensuring operational integrity and compliance. Lead CI/CD processes, improve release management, and facilitate cross-functional coordination for UK platforms.
Top Skills: CheckmarxoneCloudflareDataprocGCPGkeGrafanaHashicorp VaultHelmKafkaKeycloakKong ApimKubernetesOpentelemetryPingPostgresPrometheusRedisTerraformWiz
14 Hours Ago
In-Office
Chennai, Tamil Nadu, IND
Senior level
Senior level
Cloud • Fintech • Food • Information Technology • Software • Hospitality
The Senior HRIS Analyst will manage Workday configurations, improve HR processes, and liaise with teams for employee experience enhancements. Responsible for system efficiency and compliance, particularly in the Benefits module.
Top Skills: Benefits ModuleWorkday Hcm

What you need to know about the Chennai Tech Scene

To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account