Join us as a Site Reliability Engineer
- In this key role, you’ll support the improvement of non-functional and operational characteristics such as availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning of our products and services
- You’ll enjoy significant stakeholder interaction, working in collaboration with engineers to ensure a principled approach to deliver change in a safe and secure way
- This is a chance to join an inclusive team with a collaborative ethos and a commitment to innovation and professional development
- We're offering this role at associate level
As our Site Reliability Engineer, As our Site Reliability Engineer, you’ll contribute to the reliability, monitoring and operational excellence of cloud-native platforms.
You’ll work closely with senior engineers to support production systems, implement SRE practices, and ensure services are observable, scalable and resilient. You’ll also participate in the 24/7 support and on-call rotation, gaining experience in incident response and platform operations.
You'll also be:
- Supporting the operation of AWS-based Kubernetes platforms (EKS)
- Contributing to monitoring, alerting and observability implementations using tools like Grafana and Prometheus
- Assisting in incident management, troubleshooting and root cause analysis
- Participating in on-call rotations and production support activities
- Implementing infrastructure changes using Terraform and GitOps workflows
- Supporting CI/CD pipelines (GitLab, Argo CD) and deployment processes
- Helping improve system reliability through automation and operational improvements
- Following SRE practices such as runbooks, documentation and post-incident reviews
- Working with DevOps and engineering teams to improve system performance and stability
- Ensuring solutions align with security, compliance and operational standards
We’re looking for an engineer with solid foundational experience in cloud platforms and a keen interest in reliability engineering and production operations.
You'll also need:
- Experience working with AWS and Kubernetes (EKS) in a production or pre-production environment
- Familiarity with monitoring and observability tools such as Grafana and Prometheus
- Understanding of CI/CD pipelines and Git-based workflows (GitLab preferred)
- Exposure to Terraform or infrastructure-as-code concepts
- Basic understanding of SRE practices and production support models
- Experience troubleshooting applications or infrastructure issues
- Awareness of networking and security fundamentals in cloud environments
- Willingness to participate in on-call rotations and incident response
- Strong problem-solving mindset and eagerness to learn
- Good communication and collaboration skills
Hours
45Job Posting Closing Date:
16/06/2026What you need to know about the Chennai Tech Scene
To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.
