Design, automate, and maintain highly available AWS cloud infrastructure. Build and operate containerized workloads (ECS/Fargate), manage RDS/PostgreSQL, implement IaC (Terraform/CloudFormation), CI/CD pipelines, caching, observability, incident response, troubleshooting, and collaborate with engineering teams to improve reliability and scalability.
We are looking for Site Reliability Engineer - Chennai
About the job
What You ‘ll Do
You will join our high-performance Cloud Engineering team and play a critical role in designing, automating, and maintaining highly available, scalable, and resilient cloud infrastructure platforms that power enterprise-grade applications and digital experiences.
- Design, implement, and manage scalable AWS cloud infrastructure to support mission-critical business applications.
- Build, deploy, and maintain containerized workloads using AWS ECS and AWS Fargate.
- Manage and optimize relational database infrastructure leveraging AWS RDS and PostgreSQL.
- Develop and maintain Infrastructure as Code (IaC) frameworks to enable automated, repeatable, and consistent infrastructure deployments.
- Design, implement, and optimize CI/CD pipelines to improve deployment efficiency, reliability, and release velocity.
- Implement and optimize high-performance caching strategies to improve application responsiveness, scalability, and overall system performance.
- Ensure infrastructure availability, reliability, scalability, and security across production and non-production environments.
- Monitor platform health, system performance, and application availability while proactively identifying and addressing potential issues.
- Troubleshoot and resolve complex infrastructure, networking, and platform-related issues across distributed systems.
- Perform root cause analysis for production incidents and drive preventive actions to improve system reliability.
- Collaborate closely with software engineering, architecture, DevOps, and QA teams to support cloud-native application delivery.
- Establish and implement reliability engineering best practices including observability, automation, incident management, and capacity planning.
- Document operational procedures, architecture decisions, and best practices to ensure knowledge sharing and operational excellence.
- Contribute to continuous improvement initiatives focused on system reliability, operational efficiency, and infrastructure modernization.
What We Seek In You
- 5+ years of experience designing, implementing, managing, and troubleshooting scalable cloud infrastructure environments.
- Strong hands-on expertise in AWS cloud services and cloud-native architecture patterns.
- Proven experience working with containerized environments using AWS ECS and AWS Fargate.
- Strong experience managing relational database platforms including AWS RDS and PostgreSQL.
- Advanced proficiency in Infrastructure as Code (IaC) tools such as Terraform, AWS CloudFormation, or equivalent technologies.
- Extensive experience designing and implementing CI/CD automation pipelines using modern DevOps practices and tools.
- Strong expertise implementing and optimizing high-performance caching layers and distributed caching solutions.
- Solid understanding of cloud architecture principles including high availability, scalability, fault tolerance, and disaster recovery.
- Strong troubleshooting and analytical skills with the ability to diagnose issues across infrastructure, networking, and applications.
- Experience working in Linux-based environments and cloud-native ecosystems.
- Strong understanding of Site Reliability Engineering principles including observability, monitoring, automation, and incident management.
- Experience with monitoring, logging, and observability tools for proactive system management.
- Excellent communication, stakeholder management, and collaboration skills.
- Experience working within Agile and DevOps delivery environments.
Preferred Qualifications
- Hands-on experience deploying and managing workloads within AWS China regions.
- Strong understanding of network-level troubleshooting including VPC, DNS, Routing, Load Balancers, Security Groups, and connectivity diagnostics.
- Experience troubleshooting distributed systems and high-scale cloud-native applications.
- Familiarity with highly scalable API platforms and backend services supporting enterprise applications.
- Exposure to CDN, caching architectures, and edge delivery strategies is preferred.
- Knowledge of Kubernetes, Docker, and modern container orchestration technologies is an added advantage.
- Experience with performance testing, load testing, and capacity planning methodologies is preferred.
- Bachelor's or Master's degree in Computer Science, Engineering, Information Technology, or a related discipline.
Life At Next
At our core, we're driven by the mission of tailoring growth for our customers by enabling them to transform their aspirations into tangible outcomes. We're dedicated to empowering them to shape their futures and achieve ambitious goals. To fulfil this commitment, we foster a culture defined by agility, innovation, and an unwavering commitment to progress. Our organizational framework is both streamlined and vibrant, characterized by a hands-on leadership style that prioritizes results and fosters growth.
Perks Of Working With Us
- Clear objectives to ensure alignment with our mission, fostering your meaningful contribution.
- Abundant opportunities for engagement with customers, product managers, and leadership.
- You'll be guided by progressive paths while receiving insightful guidance from managers through ongoing feedforward sessions.
- Cultivate and leverage robust connections within diverse communities of interest. Choose your mentor to navigate your current endeavors and steer your future trajectory.
- Embrace continuous learning and upskilling opportunities through Nexversity.
- Enjoy the flexibility to explore various functions, develop new skills, and adapt to emerging technologies.
- Embrace a hybrid work model promoting work-life balance.
- Access comprehensive family health insurance coverage, prioritizing the well-being of your loved ones.
- Embark on accelerated career paths to actualize your professional aspirations.
Who we are?
We enable high growth enterprises build hyper personalized solutions to transform their vision into reality. With a keen eye for detail, we apply creativity, embrace new technology and harness the power of data and AI to co-create solutions tailored made to meet unique needs for our customers.
Join our passionate team and tailor your growth with us!
TVS Next Chennai, Tamil Nadu, IND Office
Chennai, India
Similar Jobs
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
Lead SRE responsible for administering enterprise-scale cloud infrastructure, managing Kubernetes workloads, implementing SRE practices (Terraform, CI/CD, observability), automating tooling with Python/JavaScript/Go, supporting cloud security, incident response, and centralized logging/monitoring for production applications.
Top Skills:
Ai-OpsAWSAzureGCPGoJavaScriptKubernetesLinuxPythonTerraformWindows
Fintech • Payments • Financial Services
Engineer and maintain high-performance, secure software and CI/CD pipelines within a feature team. Design, produce, test, deploy, and operate containerized workloads across dev/test/prod. Manage GitLab platform, optimize pipelines, implement observability (CloudWatch, Prometheus, Grafana), embed DevSecOps practices, and drive reliability, automation, and operational excellence.
Top Skills:
AWSCloudwatchDockerEcsEksGitlabGitlab Ci/CdGitlab RunnersGrafanaPrometheus
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
Lead and operate enterprise-scale cloud infrastructure and SaaS/PaaS production systems. Manage Kubernetes workloads, implement SRE best practices (Terraform, CI/CD, observability, incident response), automate with Python/JavaScript/Go, apply cloud security measures, and leverage AI-Ops tools for operations and analysis.
Top Skills:
Ai-OpsAWSAzureCentralized Logging And Monitoring ToolsCi/CdCloud SecurityContainerization (Docker)Ddos PreventionGCPGitGoIncident ResponseJavaScriptKubernetesLinuxObservabilityPythonTerraformVulnerability ManagementWindows
What you need to know about the Chennai Tech Scene
To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.


