Zafin Logo

Zafin

Cloud Site Reliability Engineer II

Posted 2 Days Ago
Be an Early Applicant
Chennai, Tamil Nadu
Senior level
Chennai, Tamil Nadu
Senior level
The Cloud Site Reliability Engineer II will manage complex technical issues in Zafin's cloud environment and enhance operational reliability. Responsibilities include conducting root cause analysis, optimizing cloud infrastructure, mentoring junior engineers, and automating operational processes while driving strategic initiatives across cross-functional teams.
The summary above was generated by AI

Who we are

Founded in 2002, Zafin offers a SaaS product and pricing platform that simplifies core modernization for top banks worldwide. Our platform enables business users to work collaboratively to design and manage pricing, products, and packages, while technologists streamline core banking systems. 

With Zafin, banks accelerate time to market for new products and offers while lowering the cost of change and achieving tangible business and risk outcomes. The Zafin platform increases business agility while enabling personalized pricing and dynamic responses to evolving customer and market needs. 

Zafin is headquartered in Vancouver, Canada, with offices and customers around the globe including ING, CIBC, HSBC, Wells Fargo, PNC, and ANZ. Zafin is proud to be recognized as a top employer and certified Great Place to Work® in Canada, India and the UK.  


Job Summary

Zafin is seeking a Cloud Site Reliability Engineer II (CSRE II) to lead strategic initiatives in ensuring the reliability, scalability, and performance of our cloud infrastructure and applications. This advanced role requires mastery in cloud technologies, strategic planning, and incident management to drive innovative solutions and operational excellence.

As a CSRE II, you will influence the direction of cloud reliability strategies, mentor junior engineers, and lead significant projects that have a broad organizational impact. This position reports directly to the VP of Cloud Services and requires a proactive, collaborative mindset to achieve operational and strategic objectives.

Key Responsibilities

  • Lead and manage the resolution of complex technical issues involving Zafin’s products and Azure cloud environment.
  • Design and implement strategic operational enhancements to improve resiliency and system reliability.
  • Conduct in-depth Root Cause Analysis (RCA) for high-severity incidents and drive initiatives to reduce error recurrence.
  • Represent the organization in external client escalation calls, providing expert guidance and solutions.
  • Architect and optimize cloud infrastructure for high performance, scalability, and cost-effectiveness.
  • Provide thought leadership in managing and scaling container orchestration platforms such as AKS and OpenShift.
  • Oversee the implementation of advanced monitoring solutions and integrate predictive analytics for proactive issue resolution.
  • Develop and execute automation strategies to streamline operational workflows and incident responses.
  • Create and maintain comprehensive documentation of cloud architectures, processes, and incident management strategies.
  • Mentor and coach junior engineers, fostering a culture of continuous learning and innovation.
  • Drive strategic initiatives, collaborating with cross-functional teams to achieve organizational objectives.

Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or a related field (Master’s degree preferred).
  • 12+ years of experience in cloud support, operations, or a related role.
  • Advanced expertise in Microsoft Azure (preferred) or equivalent cloud platforms.
  • Demonstrated experience in designing and scaling container orchestration systems like AKS or OpenShift.
  • Proven leadership in managing automated deployment pipelines, including Azure DevOps.
  • Mastery in enterprise monitoring platforms (e.g., Azure Insights, Grafana) and predictive analytics tools.
  • Advanced scripting skills with PowerShell, Python, or similar languages.
  • Extensive experience in incident management and defining SLAs for global production environments.
  • In-depth knowledge of database management, particularly Postgres.

Preferred Qualifications

  • Advanced certifications in cloud platforms (e.g., Azure Solutions Architect Expert).
  • Experience with ITSM tools and processes (e.g., ServiceNow).
  • Comprehensive understanding of security and compliance in cloud environments.

Soft Skills

  • Exceptional analytical and problem-solving abilities.
  • Strong leadership and mentoring skills.
  • Advanced communication and collaboration capabilities.
  • Visionary approach to operational innovation and strategic planning.

What’s in it for you

Joining our team means being part of a culture that values diversity, teamwork, and high-quality work. We offer competitive salaries, annual bonus potential, generous paid time off, paid volunteering days, wellness benefits, and robust opportunities for professional growth and career advancement. Want to learn more about what you can look forward to during your career with us? Visit our careers site and our openings: zafin.com/careers

Zafin welcomes and encourages applications from people with disabilities. Accommodations are available on request for candidates taking part in all aspects of the selection process. 

Zafin is committed to protecting the privacy and security of the personal information collected from all applicants throughout the recruitment process. The methods by which Zafin contains uses, stores, handles, retains, or discloses applicant information can be accessed by reviewing Zafin’s privacy policy at https://zafin.com/privacy-notice/. By submitting a job application, you confirm that you agree to the processing of your personal data by Zafin described in the candidate privacy notice.

Top Skills

Aks
Azure
Openshift
Postgres
Powershell
Python

Zafin Chennai, Tamil Nadu, IND Office

TVH Agnitio Park 2nd Floor 141, Rajiv Gandhi Salai, Perungudi, Chennai, Tamil Nadu, India, 600096

Similar Jobs

2 Days Ago
Chennai, Tamil Nadu, IND
Senior level
Senior level
Fintech • Payments • Software
The Cloud Site Reliability Engineer I will ensure the seamless operation and maintenance of Zafin's cloud infrastructure, enhancing system reliability and performance. Responsibilities include providing technical support for cloud issues, conducting incident management, optimizing cloud infrastructure, and developing automation scripts, while collaborating with various internal teams for operational enhancements.
Top Skills: AzurePowershellPython
3 Days Ago
Chennai, Tamil Nadu, IND
Junior
Junior
Hardware • Information Technology • Other • Software • Analytics
As a Site Reliability Engineer II, you will design, maintain, and optimize high-availability systems in the cloud, emphasizing automation through Infrastructure as Code and monitoring systems. You'll handle incident response, capacity planning, and promote best practices while collaborating with development teams. Mentorship of junior engineers and continuous improvement will be key aspects of your role.
Top Skills: Python
7 Days Ago
Chennai, Tamil Nadu, IND
Expert/Leader
Expert/Leader
Information Technology • Software
The Lead Software Engineer (Cloud DevOps) will design and implement cloud infrastructure, manage CI/CD pipelines, and provide leadership for DevOps practices. They will mentor a team of engineers, work closely with product teams, ensure reliable service delivery, and improve existing systems using modern cloud technologies.
Top Skills: JavaPythonRuby

What you need to know about the Chennai Tech Scene

To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account