Genesys Logo

Genesys

Senior Operations Reliability Engineer – Cloud Infrastructure (AWS & Windows)

Reposted 2 Hours Ago
Be an Early Applicant
In-Office
Chennai, Tamil Nadu
Senior level
In-Office
Chennai, Tamil Nadu
Senior level
As a Senior Operations Reliability Engineer, you'll enhance cloud reliability and automation, manage incidents, and troubleshoot AWS and Windows systems.
The summary above was generated by AI

Genesys empowers organizations of all sizes to improve loyalty and business outcomes by creating the best experiences for their customers and employees. Through Genesys Cloud, the AI-powered Experience Orchestration platform, organizations can accelerate growth by delivering empathetic, personalized experiences at scale to drive customer loyalty, workforce engagement, efficiency and operational improvements.

We employ more than 6,000 people across the globe who embrace empathy and cultivate collaboration to succeed. And, while we offer great benefits and perks like larger tech companies, our employees have the independence to make a larger impact on the company and take ownership of their work. Join the team and create the future of customer experience together.

Overview 

As a Senior Operations Reliability Engineer with a specialization in Cloud Infrastructure, you will play a key role in maintaining and improving the reliability, stability, and operational maturity of enterprise cloud and compute environments. This role focuses primarily on AWS infrastructure, with supporting responsibility for Azure and Windows-based systems. 

You will lead incident detection, advanced troubleshooting, patching and vulnerability remediation validation, and proactive reliability improvements across AWS services and Windows/Linux compute platforms. In addition to hands-on operational support, you will actively contribute to automation initiatives, AIOps tuning, and the continuous improvement of monitoring, correlation, and signal quality. 

This role blends advanced cloud operations with reliability engineering practices, including event correlation, automation validation, telemetry refinement, and support for emerging self-healing capabilities. You will collaborate across Cloud Engineering, Security, IAM, Network, and ServiceNow teams to strengthen operational standards and accelerate automation maturity across the platform. 

 

Responsibilities 

General Reliability Operations 

  • Resolve complex cloud and OS-related incidents through advanced troubleshooting, coordinating cross-functional teams when necessary. 

  • Monitor observability, AIOps, and event management platforms to detect anomalies, performance degradation, and emerging risks across AWS and compute systems. 

  • Perform advanced incident triage and event correlation to determine root cause and reduce misrouted or duplicate incidents. 

  • Lead validation of automated remediation workflows and ensure reliability of automation before production adoption. 

  • Identify recurring manual operational tasks and translate them into automation requirements or lightweight scripted solutions. 

  • Contribute structured operational insights, telemetry improvements, and signal refinement recommendations to reduce alert noise. 

  • Lead post-incident reviews, including root cause documentation and reliability improvement actions. 

  • Ensure cloud and OS telemetry aligns with monitoring, governance, and CMDB standards to support accurate correlation and impact analysis. 

  • Partner with Cloud Engineering, Security, IAM, and Network teams to mature reliability practices and reduce operational risk. 

 

Cloud & Windows Infrastructure Responsibilities 

  • Troubleshoot advanced AWS operational issues, including EC2 performance anomalies, networking misconfigurations, IAM policy conflicts, storage degradation, and service dependency failures. 

  • Support Azure VM and cloud service troubleshooting where applicable, ensuring cross-cloud awareness. 

  • Perform deep OS-level diagnostics and remediation primarily within Windows Server environments, with supporting responsibilities across Linux systems. 

  • Analyze telemetry from AWS CloudWatch, system logs, and vulnerability management platforms to detect trends and systemic weaknesses. 

  • Own validation and oversight of patching and vulnerability remediation workflows primarily for Windows systems with a supporting role of Linux systems, ensuring compliance and reducing drift. 

  • Improve tagging compliance, IAM access hygiene, backup validation, and governance posture through operational enforcement and automation. 

  • Validate and support resilience testing (backup restores, failover simulations, DR exercises). 

  • Contribute to infrastructure-as-code (Terraform) enhancements. 

  • Develop scripts (PowerShell, Python, CLI-based automation) to improve repeatability and reduce manual effort. 

  • Participate in readiness planning for new AWS services, infrastructure changes, or architectural updates, ensuring monitoring and operational support models are in place. 

  • Provide mentorship and technical guidance to junior reliability engineers. 

 

Automation & AIOps Contributions 

  • Actively tune alert thresholds, suppression logic, and event correlation rules within AIOps and monitoring platforms. 

  • Partner with teams to refine automated remediation logic and validate reliability before rollout. 

  • Improve cloud signal quality by ensuring accurate metrics, logs, and dependency mapping across AWS services. 

  • Contribute operational feedback to enhance predictive alerting and early detection models. 

  • Track and support improvements in MTTR, alert noise reduction, patch compliance, and automation coverage. 

 

Requirements 

  • Bachelor’s degree in IT or related field, or equivalent experience. 

  • 5+ years of experience in cloud infrastructure, systems engineering, or infrastructure operations roles. 

  • Strong hands-on experience with AWS services (EC2, VPC, IAM, EBS, S3, CloudWatch, networking). 

  • Familiarity with Azure cloud environments. 

  • Solid experience administering Windows Server; working knowledge of Linux systems. 

  • Experience with patch management, vulnerability remediation, and system hardening. 

  • Strong understanding of cloud governance principles (tagging, IAM access control, backups, cost awareness, compliance). 

  • Experience working with monitoring, observability, and event management platforms. 

  • Ability to write and modify automation scripts (PowerShell, Python, CLI tools, YAML/JSON). 

  • Strong troubleshooting and analytical skills with the ability to interpret complex telemetry and log data. 

  • Experience contributing to automation initiatives or reliability improvements. 

  • Effective communication skills for cross-functional collaboration. 

  • Motivation to continue developing deeper skills in automation, AIOps, infrastructure-as-code, and cloud reliability engineering. 

 

Additional Information 

  • Working Hours: 9:00 AM – 6:00 PM IST (first shift), supporting global platform operations. 

  • On-Call Support: Participation in a shared, rotational on-call schedule is required. 

#LI-GR1
#LI-Remote

If a Genesys employee referred you, please use the link they sent you to apply.

About Genesys:

Genesys® empowers more than 8,000 organizations worldwide to create the best customer and employee experiences. With agentic AI at its core, Genesys Cloud™ is the AI-Powered Experience Orchestration platform that connects people, systems, data and AI across the enterprise. As a result, organizations can drive customer loyalty, growth and retention while increasing operational efficiency and teamwork across human and AI workforces. To learn more, visit www.genesys.com.

Reasonable Accommodations:

If you require a reasonable accommodation to complete any part of the application process, or are limited in your ability to access or use this online application and need an alternative method for applying, you or someone you know may contact us at [email protected].

You can expect a response within 24–48 hours. To help us provide the best support, click the email link above to open a pre-filled message and complete the requested information before sending. If you have any questions, please include them in your email.

This email is intended to support job seekers requesting accommodations. Messages unrelated to accommodation—such as application follow-ups or resume submissions—may not receive a response.

Genesys is an equal opportunity employer committed to fairness in the workplace. We evaluate qualified applicants without regard to race, color, age, religion, sex, sexual orientation, gender identity or expression, marital status, domestic partner status, national origin, genetics, disability, military and veteran status, and other protected characteristics.

Please note that recruiters will never ask for sensitive personal or financial information during the application phase.

Top Skills

AWS
Azure
Linux
Powershell
Python
Terraform
Windows Server

Genesys Chennai, Tamil Nadu, IND Office

Park, Block C, 7th Floor, Plot No. 40, M.G.R Salai, Perungudi, Chennai, Tamil Nadu, India, 600 096

Similar Jobs

2 Hours Ago
Remote or Hybrid
3 Locations
Senior level
Senior level
Big Data • Fintech • Information Technology • Business Intelligence • Financial Services • Cybersecurity • Big Data Analytics
Lead development, scaling, governance, and adoption of enterprise process capabilities (BPM, BPI, BPR). Manage tools, training, standards, integrations, and reusable assets to enable transformation, collaborate with cross-functional stakeholders, and drive delivery enablement and capability maturity.
Top Skills: Business Process Intelligence (Bpi)Business Process Management (Bpm)Business Process Reengineering (Bpr)Performance AnalyticsProcess Modeling ToolsSignavio
5 Hours Ago
In-Office
Chennai, Tamil Nadu, IND
Senior level
Senior level
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
Design, develop, and maintain high-performance back-end solutions; implement microservices architecture; conduct code reviews; manage applications in cloud environments, optimizing performance and scalability.
Top Skills: Angular JsDockerJavaKubernetesNoSQLReact JsSpring BootSQL
5 Hours Ago
In-Office
Chennai, Tamil Nadu, IND
Senior level
Senior level
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
The Architect will analyze and solve architectural design problems, collaborate across teams, and develop foundational components for SaaS solutions using various technologies.
Top Skills: APIsAzureGitGitJavaJenkinsJIRAOracle RdbmsReactSpring BootSpring FrameworkSQLSQL ServerWebservices

What you need to know about the Chennai Tech Scene

To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account