Site Reliability Engineer III

Sorry, this job was removed at 08:13 a.m. (IST) on Wednesday, May 28, 2025

Be an Early Applicant

In-Office or Remote

Hiring Remotely in Delhi, Connaught Place, New Delhi, Delhi

In-Office or Remote

Hiring Remotely in Delhi, Connaught Place, New Delhi, Delhi

Similar Jobs

Apollo.io

Quality Engineer

15 Hours Ago

Easy Apply

Remote

India

Easy Apply

Mid level

Artificial Intelligence • Enterprise Web • Information Technology • Productivity • Sales • Software • Database

As a Quality Engineer, you will focus on automation testing and quality assurance practices, collaborating with the engineering team to ensure product quality across various testing types and managing testing processes effectively.

Top Skills: CucumberCypressDatadogDockerDocker ComposeGcp LoggingJavaJavaScriptKubernetesNewrelicPlaywrightPythonRspecRubySeleniumTypescript

Mondelēz International

Data Scientist

15 Hours Ago

Remote

Hybrid

India

Mid level

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing

As a Data Scientist, you will analyze data to identify trends, perform root-cause analysis, and develop visualizations to support business decisions.

Top Skills: DatabricksGoogle AdwordsGoogle AnalyticsGoogle Tag ManagerExcelPower BIPythonRSASSQLTableau

Mondelēz International

Sr Assoc Manager - Marketing Analytics

15 Hours Ago

Remote

Hybrid

Mid level

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing

The Manager will spearhead marketing analytics initiatives, utilizing data-driven strategies to enhance brand growth through collaboration and measurement of media effectiveness.

Top Skills: Data AnalysisData VisualizationMarketing Mix ModellingMulti-Touch AttributionRegression Modelling

About HighLevel:

HighLevel is a cloud-based, all-in-one white-label marketing and sales platform that empowers marketing agencies, entrepreneurs, and businesses to elevate their digital presence and drive growth. We are proud to support a global and growing community of over 2 million businesses, from marketing agencies to entrepreneurs to small businesses and beyond. Our platform empowers users across industries to streamline operations, drive growth, and crush their goals.

HighLevel processes over 15 billion API hits and handles more than 2.5 billion message events every day. Our platform manages 470 terabytes of data distributed across five databases, operates with a network of over 250 micro-services, and supports over 1 million domain names.

Our People

With over 1,500 team members across 15+ countries, we operate in a global, remote-first environment. We are building more than software; we are building a global community rooted in creativity, collaboration, and impact. We take pride in cultivating a culture where innovation thrives, ideas are celebrated, and people come first, no matter where they call home.

Our Impact

Every month, our platform powers over 1.5 billion messages, helps generate over 200 million leads, and facilitates over 20 million conversations for the more than 2 million businesses we serve. Behind those numbers are real people growing their companies, connecting with customers, and making their mark - and we get to help make that happen.

Learn more about us on our YouTube Channel or Blog Posts

About the Role:

We are looking for a Site Reliability Engineer to join our team and help ensure the availability, performance, and scalability of our critical systems. You will work closely with development and operations teams to automate processes, enhance system reliability, and improve observability.

Requirements:

Experience: 4+ years in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles
Cloud Expertise: Hands-on experience with GCP and AWS
Infrastructure as Code (IaC): Terraform, Helm, or equivalent tools
Containerisation & Orchestration: Docker, Kubernetes (GKE)
Observability: Experience with Prometheus, Grafana, ELK, OpenTelemetry, or similar monitoring/logging tools
Programming/Scripting: Proficiency in Python, Bash, or Shell scripting. Basic understanding of API parsing and JSON manipulation
CI/CD Pipelines: Hands-on experience with Jenkins, GitHub Actions, ArgoCD, or similar tools
Incident Management: Experience with on-call rotations, SLOs, SLIs, SLAs, Escalation Policies, and incident resolution
Databases: Experience in monitoring MongoDB, Redis, ES, Queue based etc

Responsibilities:

Develop and improve observability using monitoring, logging, tracing, and alerting tools (Prometheus, Grafana, ELK, OpenTelemetry, etc.)
Optimize system performance, troubleshoot incidents, and conduct post-mortems/RCA to prevent future issues
Collaborate with developers to enhance application reliability, scalability, and performance
Drive cost optimisation efforts in cloud environments.
Monitor multiple databases (MongoDB, Redis, ES, Queue based etc.)

EEO Statement:

The company is an Equal Opportunity Employer. As an employer subject to affirmative action regulations, we invite you to voluntarily provide the following demographic information. This information is used solely for compliance with government recordkeeping, reporting, and other legal requirements. Providing this information is voluntary and refusal to do so will not affect your application status. This data will be kept separate from your application and will not be used in the hiring decision.

#LI-Remote

#LI-HB1

What you need to know about the Chennai Tech Scene

To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.

By clicking Apply you agree to share your profile information with the hiring company.