ScyllaDB Logo

ScyllaDB

Site Reliability Engineer

Posted Yesterday
Be an Early Applicant
In-Office or Remote
Hiring Remotely in Bangalore, Bengaluru Urban, Karnataka
Mid level
In-Office or Remote
Hiring Remotely in Bangalore, Bengaluru Urban, Karnataka
Mid level
As an SRE Engineer at ScyllaDB, you will manage cloud operations, enhance reliability and performance of Scylla Cloud, and automate tasks through scripting.
The summary above was generated by AI

ScyllaDB is seeking experienced and dynamic individuals to join our Cloud Operations & Site Reliability Engineering (SRE) team. As a Scylla Cloud Operations & SRE Engineer, you will play a vital role in maintaining the operational excellence of our cutting-edge NoSQL database platform, Scylla Cloud. Leveraging your expertise in cloud infrastructure, AI, and system operations, you will ensure the reliability, scalability, and performance of our cloud offerings. If you are passionate about working in a fast-paced environment, collaborating with cross-functional teams, and driving continuous improvement, this role is tailored for you.

Applicants for this position should be able to start their workday anytime between 00:00 GMT and 10:00am GMT.

Responsibilities:

  • Collaborate with the Support & DevOps teams to ensure the smooth day-to-day operation of Scylla Cloud.
  • Monitor system health, troubleshoot issues, and proactively address any operational challenges.
  • Act as a liaison with the Support Organization to address cloud platform-related issues.
  • Respond to tasks and tickets escalated by Support Staff, and collaborate to ensure timely resolutions.
  • Develop and maintain a comprehensive runbook that can be leveraged by Support Staff to troubleshoot and resolve common issues, improving efficiency in issue resolution.
  • Create scripts and automation solutions to streamline operational tasks and enhance efficiency.
  • Contribute to the development of automation strategies for cloud infrastructure management.
  • Assist and perform migrations of ScyllaDB clusters between clouds and accounts.
  • Assist and perform upgrades for Scylla Cloud, including Scylla database versions, OS upgrades, and security patches.
  • Collaborate with DevOps/Cloud Engineering to ensure seamless upgrade processes.
  • Participate in scaling up and down Scylla Monitor & Scylla Managers servers based on demand. Employ proactive monitoring strategies to identify and address potential performance bottlenecks and resource constraints.
  • Feature Requests: Collaborate with the Cloud Engineering team to define and create feature requests that enhance the functionality and performance of Scylla Cloud.
  • Conduct regular cluster health and performance audits, identifying areas for optimization. Implement strategies to enhance the efficiency and reliability of Scylla Cloud clusters.
  • Work closely with the Customer Success team to ensure that provisioned resources align with customer needs and purchased packages. Provide insights into potential scaling opportunities and usage optimization.
  • Demonstrate a deep understanding of public cloud environments (AWS, GCP, Azure), Kubernetes, Linux system operations, and NoSQL database deployment/management. Apply this knowledge to resolve complex technical challenges.
  • Utilize scripting languages like Python, Terraform, Ansible and Bash to create automation tools that enhance operational efficiency.
  • Cross-Functional Collaboration: Collaborate closely with Support and Engineering teams to address issues, drive improvements, and implement customer-focused solutions.
  • Utilize AI effectively and securely to optimize tasks and automation.
  • 3+ years of experience in public cloud platforms (AWS, GCP, Azure).
  • 3+ years of Linux system operations and metrics analysis.
  • Availability to begin work between 00:00 AM and 10:00 AM GMT.
  • Strong scripting skills in Python and Bash.
  • Experience with reporting and visualization tools such as Splunk, Grafana, Prometheus, and Kibana.
  • Excellent written and verbal English communication skills.
  • Exceptional organizational skills and ability to manage multiple projects concurrently.
  • Ability to work both independently and collaboratively within cross-functional teams.
  • Strong problem-solving skills, especially under pressure.
  • Eagerness to continuously learn and adapt to emerging technologies.
  • Familiarity with container technologies like Docker and Kubernetes.
  • Familiarity within automation tools such as Ansible and Terraform.

Nice to Have:

  • Experience with AI assisted scripting/coding with tools such as Cursor, Windsurf, Kiro, Antigravity, or Claude Code.
  • Proficiency with automation tools such as Ansible and Terraform.
  • 3+ years of Argo Workflow or Jenkins experience
  • Proven expertise in NoSQL database deployment, management, and data modeling.

If you are passionate about contributing to the success of ScyllaDB's cloud offerings and thrive in a dynamic and collaborative environment, we invite you to join our Cloud Operations & SRE team. Your technical expertise, problem-solving skills, and dedication will play a crucial role in ensuring the reliability and performance of Scylla Cloud for our global customer base.


Top Skills

Ansible
AWS
Azure
Bash
GCP
Grafana
Kibana
Kubernetes
Linux
NoSQL
Prometheus
Python
Splunk
Terraform

Similar Jobs

Yesterday
Remote
IN
Expert/Leader
Expert/Leader
Big Data • Information Technology • Software • Database • Analytics • Infrastructure as a Service (IaaS) • Big Data Analytics
The Staff Software Engineer I - SRE role involves proactive reliability engineering, incident management, and cross-team leadership to enhance data stream reliability for Confluent Cloud.
Top Skills: AWSAzureCi/CdGCPKafkaKubernetes
3 Days Ago
In-Office or Remote
Bangalore, Bengaluru Urban, Karnataka, IND
Junior
Junior
Cloud • Information Technology
The role involves managing and troubleshooting cloud networking infrastructure, providing 24x7 coverage, automating deployments, and developing test plans.
Top Skills: AWSAzureCi/CdConcourse-CiGCPGrafanaJenkinsKubernetesPrometheusTerraform
3 Days Ago
In-Office or Remote
Bangalore, Bengaluru Urban, Karnataka, IND
Entry level
Entry level
Artificial Intelligence • Information Technology • Software
The DevOps/SRE will ensure the reliability of the AIOps platform, automate software deployment, manage infrastructure, and improve CI/CD practices.
Top Skills: AIAksAnsibleBashEksGitopsGkeKubernetesLinuxMlPowershellPythonRke2Terraform

What you need to know about the Chennai Tech Scene

To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account