Qualys Logo

Qualys

Lead Site Reliability Engineer, DevOps

Posted 20 Days Ago
Be an Early Applicant
In-Office
Pune, Mahārāshtra
Senior level
In-Office
Pune, Mahārāshtra
Senior level
The Senior Site Reliability Engineer will enhance observability and reliability in large distributed systems through monitoring, incident response, and automation.
The summary above was generated by AI

Come work at a place where innovation and teamwork come together to support the most exciting missions in the world!

Job Title

Senior Site Reliability Engineer (SRE) – Observability & DevOps

Role Summary

We are looking for a Senior SRE who will own and evolve our observability and reliability platform. The ideal candidate has strong Linux fundamentals, hands-on experience with modern monitoring stacks, and the ability to design scalable alerting and metrics pipelines for large, distributed systems.

This role requires both deep technical expertise and production ownership mindset.

Primary ResponsibilitiesObservability & Monitoring
  • Design, implement, and maintain end-to-end observability using:
    • Prometheus for metrics collection
    • Alertmanager for alert routing, deduplication, and escalation
    • Grafana for visualization and dashboards
    • AppDynamics for APM, transaction tracing, and application health
  • Build actionable dashboards for:
    • SLIs, SLOs, and error budgets
    • Application, infrastructure, and platform health
  • Reduce alert fatigue by implementing signal-based alerting and proper severity models
Data & Metrics Platform
  • Manage and optimize ClickHouse for:
    • High-volume metrics, logs, or traces
    • Long-term retention and fast analytical queries
  • Work on schema design, performance tuning, and cost optimization
Reliability & Operations
  • Define and measure SRE best practices (SLIs, SLOs, SLAs)
  • Participate in incident response, postmortems, and root cause analysis
  • Drive reliability improvements through automation and capacity planning
Automation & Engineering
  • Develop tooling and automation using at least one scripting/programming language
  • Automate monitoring onboarding, alert generation, dashboard creation
  • Improve operational efficiencies across DevOps tooling
Required Technical Skills (Must-Have)Core Skills
  • Strong Linux fundamentals
    • Troubleshooting, performance tuning, networking, system internals
  • Scripting / Programming (Any one or more):
    • Python (preferred), Bash, Go, or similar
  • Observability Tools (Hands-on):
    • Prometheus
    • Alertmanager
    • Grafana
    • AppDynamics
  • Data Platform:
    • Hands-on experience with ClickHouse
Monitoring & Alerting Concepts
  • Metrics vs logs vs traces
  • Golden signals (latency, traffic, errors, saturation)
  • Alert thresholds, routing policies, escalation strategies
Preferred / Nice-to-Have Skills
  • Kubernetes monitoring (Prometheus Operator, kube-state-metrics)
  • Infrastructure as Code (Terraform, Helm)
  • CI/CD observability
  • Cloud platforms (AWS / Azure / GCP)
  • Experience managing observability at scale (100+ services / platforms)
Senior-Level Expectations
  • Ability to architect observability solutions, not just operate them
  • Strong production troubleshooting and incident ownership
  • Mentoring junior engineers
  • Influence DevOps and SRE best practices across teams
  • Communicate clearly with developers and leadership
Experience & Qualification
  • 5-7 years of experience in SRE / DevOps / Production Engineering
  • Experience operating high-availability, large-scale systems
  • Proven background in observability-driven reliability improvements

Top Skills

Alertmanager
Appdynamics
Bash
Clickhouse
Go
Grafana
Prometheus
Python

Similar Jobs

33 Minutes Ago
Hybrid
4 Locations
Senior level
Senior level
Big Data • Fintech • Information Technology • Business Intelligence • Financial Services • Cybersecurity • Big Data Analytics
The Sr. Consultant will strategize, execute, and optimize automated marketing campaigns using Eloqua, supporting demand generation and customer engagement initiatives while analyzing campaign performance.
Top Skills: CSSEloquaHTMLOracle Marketing CloudSalesforce
4 Hours Ago
Hybrid
2 Locations
Senior level
Senior level
Big Data • Fintech • Information Technology • Business Intelligence • Financial Services • Cybersecurity • Big Data Analytics
As an Engineer in the Marketing Solutions team, you will design, develop, and maintain backend services with Java or Python, focusing on APIs, microservices, and collaboration with cross-functional teams.
Top Skills: AWSCi/CdFlaskGCPGitJavaJenkinsPostgresPythonSpring BootSQL
4 Hours Ago
Hybrid
2 Locations
Senior level
Senior level
Big Data • Fintech • Information Technology • Business Intelligence • Financial Services • Cybersecurity • Big Data Analytics
The Engineer will develop and maintain backend services using Java and Python, enhance APIs and microservices, and collaborate effectively within teams.
Top Skills: FlaskGitJavaJenkinsPostgresPythonSpring BootSQL

What you need to know about the Chennai Tech Scene

To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account