SingleStore Jobs

Site Reliability Engineer

SingleStore

Site Reliability Engineer

Reposted 5 Days Ago

Be an Early Applicant

Remote

Hiring Remotely in India

Senior level

Remote

Hiring Remotely in India

Senior level

As a Site Reliability Engineer, you will optimize SingleStore's managed services across clouds, automate infrastructure, debug live issues, and improve customer experiences.

The summary above was generated by AI

Position Overview

SingleStore is seeking a Site Reliability Engineer to help optimize and scale our managed service offering across all three major cloud providers. In this role, you will be at the intersection of leading technology trends – A highly performant distributed database, managed by Kubernetes, running in the cloud. This is a great opportunity to push the boundaries with a cloud-focused SRE role.

This is a development role, requiring an engineering mindset to solve operational challenges. You will be part of a globally distributed team of engineers, helping to drive SRE practices across the company. Through infrastructure automation, you will help us grow our service across multiple cloud platforms. This requires a relentless focus on eliminating manual processes. You will also leverage our monitoring platform to improve the overall customer experience by systematically identifying and fixing any issues impacting our customers. As an SRE, you will also help diagnose issues on the platform, leveraging a deep understanding of the SingleStore query engine along with the backend infrastructure.

Roles and Responsibilities

Develop automation platform to manage infrastructure rollouts across cloud providers
Optimize telemetry platform to identify customer impacting events while providing relevant data to drive debugging
Partner with engineering team to optimize performance of services for cloud architecture
Debug Live Site events and conduct follow-up postmortem and RCA analysis
Participate in an SLA-driven on-call rotation, which will include after-hours, weekend, and rotating holiday participation.

Required Skills and Experience

5 years of demonstrated experience working as a Site Reliability Engineer
Infrastructure automation experience. Scripting experience (Python, Bash) a plus.
Experience with the Prometheus monitoring stack. Experience with Grafana, Mimir and Loki is a plus.
Knowledge of Kubernetes and the container ecosystem
Strong cross group collaboration and communication skills
Familiar with at least one of AWS, Azure, or Google Cloud
Experience debugging, diagnosing and troubleshooting complex, production software
B.S. Degree in Computer Science or related field

SingleStore is a global database company that empowers the world’s leading organizations to build and scale cutting-edge AI applications on a unified data platform that supports real-time transactions, analytics, and search. Our platform handles streaming data ingestion, vector search, full-text search, and multi-model data types - all with high performance, petabyte-scale capacity, high user concurrency, and low latency.
As a leader recognized by both Gartner and Forrester Wave, SingleStore serves the world‘s leading data innovators including the top Fortune 500 enterprises. Our 95%+ gross retention rate reflects the strong satisfaction and trust our customers place in the platform.SingleStore is owned by private equity firm Vector Capital and is headquartered in San Francisco, with offices worldwide, including Hyderabad.
To all recruitment agencies: SingleStore does not accept agency resumes. Please do not forward resumes to SingleStore employees. SingleStore is not responsible for any fees related to unsolicited resumes and will not pay fees to any third-party agency or company that does not have a signed agreement with the Company.

Req ID:ENG00445

Similar Jobs

CrowdStrike

Site Reliability Engineer

4 Days Ago

Remote or Hybrid

India

Senior level

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity

Lead a distributed SRE team owning CI/CD platform reliability, automation, observability, and data infrastructure. Provide people management, technical direction, architecture input, operational excellence, and cross-team collaboration while driving automation, monitoring, and AI-assisted workflows.

Top Skills: AnsibleApache AirflowSparkAWSAzureBashBazelBitbucketChefDatadogGCPGitGithub ActionsGitlabGitlab CiGoGrafanaHumio/LogscaleJenkinsKafkaKubernetesNasNfsObject StorageOpensearchOraclePostgresPowershellPrometheusPulsarPuppetPythonRedisRedpandaSanSli/SloSplunkTerraformValleyVarnish

Pythian

Site Reliability Engineer

5 Days Ago

In-Office or Remote

Mid level

Cloud • Analytics

Design, deploy, and operate large-scale distributed systems; manage Kubernetes clusters and Istio; automate workflows with Go/Python/Shell; build monitoring with Prometheus/Grafana/Loki; troubleshoot networking, storage, and performance; participate in on-call rotations and postmortems; ensure infrastructure readiness for AI/ML workloads.

Top Skills: DockerGoGCPGrafanaIstioKubernetesLinuxLokiPkiPrometheusPythonShell ScriptingTerraform

Trading Technologies

Site Reliability Engineer

6 Days Ago

In-Office or Remote

GIFT City, Gāndhīnagar, Gujarāt, IND

Mid level

Fintech • Information Technology

Develop and maintain telemetry and automation tools to monitor and improve global platform reliability. Participate in on-call rotations, diagnose and resolve incidents, implement automated incident response and preventive measures, and optimize system stability through proactive monitoring and corrective automation.

Top Skills: Aws CloudwatchAws DynamodbAws Ec2Aws EksAws ElbAws LambdaAws RdsAws SqsDnsGoHTTPLinuxLoad BalancingPythonTcp/IpTerraform

What you need to know about the Chennai Tech Scene

To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.