Synechron

Site Reliability Engineer (SRE) with AWS, Oracle, and Automation Expertise

Posted Yesterday

Be an Early Applicant

In-Office or Remote

Hiring Remotely in Hinjawadi, Pune, Mahārāshtra, IND

Senior level

In-Office or Remote

Hiring Remotely in Hinjawadi, Pune, Mahārāshtra, IND

Senior level

The Site Reliability Engineer will enhance platform stability and operational maturity, manage service health, implement automation, and support incident management.

The summary above was generated by AI

Job Summary
Synechron is seeking an experienced Site Reliability Engineer (SRE) to enhance the stability, resilience, and operational maturity of our critical Financial Crime and Transaction Monitoring platforms. This role is vital in embedding SRE best practices across observability, automation, incident management, and production support. The successful candidate will be responsible for proactively managing service health, reducing operational risks, and supporting regulatory-critical services, thereby enabling the organization to deliver reliable, scalable, and compliant solutions aligned with business objectives.

Software Requirements

Required:
- Strong understanding and hands-on experience managing production-grade systems with high reliability and availability requirements
- Expertise in SRE principles, monitoring, logging, alerting, and defining SLOs/SLA tuning
- Proficiency with AWS services including EC2, S3, RDS, VPC, IAM, and CloudWatch (latest versions or equivalents)
- Linux system administration and troubleshooting skills for enterprise environments
- Experience with Oracle databases, including performance tuning, RAC, or RMAN in large data environments
- Automation scripting skills using Python and Shell (Bash/sh) for operational automation
- Experience with monitoring tools such as Prometheus, Grafana, ELK/EFK, and PagerDuty
- Familiarity with CI/CD tools like Jenkins, GitLab CI, or AWS CodePipeline

Preferred:
- Knowledge of OFSAA, Oracle Rules Engine, or ML-enabled platform support (e.g., TRACE)
- Infrastructure-as-Code tools such as CloudFormation or Terraform
- Experience with support for high-performance Oracle environments (performance tuning, RAC, RMAN)
- Exposure to cloud-native and containerized environments (Kubernetes, Docker)

Overall Responsibilities

Improve the reliability, availability, and recoverability of Financial Crime and Transaction Monitoring platforms.
Define, monitor, and manage SLIs/SLOs to proactively ensure service health and detect anomalies.
Provide Level 1 and Level 2 support for AWS and Oracle-based platforms, handling incident resolution and root cause analysis.
Build and sustain automation solutions for monitoring, logging, alerting, and operational workflows to reduce manual toil.
Lead incident response activities, conduct post-incident reviews, and implement preventative measures.
Develop, operate, and enhance CI/CD pipelines and infrastructure automation across environments.
Collaborate with engineering teams to design scalable, resilient, and secure systems; participate in capacity planning and performance tuning.
Support deployment, patching, and configuration changes, ensuring compliance with policies and standards.
Maintain comprehensive documentation of operational procedures, configurations, and incident resolutions.
Lead continuous process improvements to enhance system reliability, operational efficiency, and compliance adherence.

Technical Skills (By Category)

Systems & Support (Essential):
- Enterprise-level system operation and support for AWS and Oracle environments
- Linux system administration and troubleshooting
- Incident management and escalation procedures

Monitoring & Automation (Essential):
- Monitoring and alerting using Prometheus, Grafana, ELK/EFK, CloudWatch
- Automation scripting with Python and Shell for operational tasks and event handling

Cloud & Infrastructure (Preferred):
- Cloud deployment, scaling, and management (AWS, Azure, GCP)
- Infrastructure-as-Code (Terraform, CloudFormation)

Databases/Data Management (Essential):
- Oracle database management, performance tuning, and recovery
- Data extraction and validation for high-volume transactional data

Development Tools & Methodologies (Essential):
- Jenkins, GitLab CI, AWS CodePipeline for CI/CD pipelines
- Version control with Git

Experience Requirements

Minimum of 8+ years supporting high-availability, mission-critical enterprise systems, particularly in financial services or comparable regulated environments.
Proven experience supporting Oracle databases, Oracle RAC, or RMAN in a high-volume context.
Strong background in enterprise support for Financial Crime and Transaction Monitoring platforms.
Demonstrated ability to lead operational support teams, manage incident escalations, and implement automation solutions.
Experience in cloud-native architecture, infrastructure automation, and observability tools.
Support experience working under regulatory and audit constraints is preferred.

Day-to-Day Activities

Monitor platform dashboards, logs, and alerts to ensure system health and performance.
Troubleshoot and resolve incidents related to operational, performance, or security issues proactively.
Conduct root cause analysis, document incident reports, and lead corrective action plans.
Automate routine operational tasks, alerts, and workflows to improve efficiency.
Collaborate with platform engineers, developers, and security teams on change management and capacity planning.
Participate in on-call rotations, incident reviews, and readiness exercises.
Continuously evaluate and recommend tools, procedures, and automation that improve reliability and reduce manual intervention.
Maintain detailed documentation of configurations, procedures, and lessons learned.

Qualifications

Bachelor’s degree in Computer Science, Engineering, or a related discipline.
8+ years supporting enterprise-scale, high-availability systems with operational excellence focus.
Experience supporting regulatory-critical platforms in financial services, especially in Fraud, Risk, or Transaction Monitoring.
Certifications in cloud platforms (AWS Certified Solutions Architect, Azure) and SRE foundations (Google SRE or equivalent) are advantageous.
Proven track record of automation, incident management, and operational improvements.

Professional Competencies

Critical thinking and analytical skills to diagnose and resolve complex operational issues.
Leadership and team management skills to guide operational teams and support team development.
Effective communication for stakeholder reporting, incident updates, and cross-team collaboration.
Ability to work under pressure, prioritize multiple tasks, and meet strict SLAs.
Adaptability to evolving technology landscapes and regulatory requirements.
Focus on continuous improvement, automation, and operational excellence.

SYNECHRON’S DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.

All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.

Candidate Application Notice

Top Skills

AWS

CloudFormation

Cloudwatch

Elk

Gitlab Ci

Grafana

Jenkins

Oracle

Prometheus

Python

Shell

Terraform

Similar Jobs

Capco

Credit Risk BA

6 Hours Ago

Remote or Hybrid

India

Mid level

Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI

The role involves business analysis and gathering requirements in wholesale credit risk, focusing on model development, regulatory compliance, and credit risk systems management.

Top Skills: Basel IiiCrrEbaEcbPra

GitLab

Back-end Engineer

6 Hours Ago

Easy Apply

Remote

India

Easy Apply

Mid level

Cloud • Security • Software • Cybersecurity • Automation

As a Backend Engineer, you'll improve Git and Gitaly by contributing features, bug fixes, and performance improvements, collaborating with the open source community, and participating in architectural discussions.

Top Skills: CGitGitalyGoLinux

GitLab

Senior Back-end Engineer

6 Hours Ago

Easy Apply

Remote

India

Easy Apply

Senior level

Cloud • Security • Software • Cybersecurity • Automation

As a Senior Backend Engineer, you'll improve Git and Gitaly for GitLab by enhancing capabilities, contributing to open source, and collaborating within the Git community.

Top Skills: CGoLinux

What you need to know about the Chennai Tech Scene

To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.