Join us as a Production Analyst
- This is an opportunity to make a real impact and be pivotal in the success of our business, while benefiting from great variety and stakeholder exposure
- We’ll look to you to deliver a complex and critical production management, infrastructure and application support service for relevant platforms, activities and processes across the domain
- Hone your existing analytical skills and advance your career in this exciting, fast paced role
- We're offering this role as associate vice president level
What you'll do
As a Production Analyst, you’ll be responsible for system performance & uptimes, IT Digital operations, maintaining and enhancing systems’ operational efficiency along with focus on deployment automation and system optimization, ensuring consistent performance and reliability. The candidate must have robust hands on problem-solving technical skills and a strong desire to implement scalable and sustainable technological solutions.
-
Anchor & provide strategic direction regarding technologies & solutions in Digital operations. Lead infrastructure & application builds & technical maintenance along with the core engineering & delivery teams.
-
Custodian of SRE SLO, SLI & Error Budgets. Application scalability & optimization: Assist in designing and implementing scalable, highly available system architectures to handle increasing loads and user demands without compromising performance.
-
Creating and optimizing CI/CD pipelines to automate testing and deployment processes, reducing the time from development to production and ensuring consistent quality control.
-
Designing, Monitoring & Responding to system alerts, Monitoring system performance, identifying bottlenecks, and executing optimization & permanent fixes.
-
Managing incident response protocols, including on-call rotations. Conducting post-incident reviews to prevent recurrence and refine the system reliability framework.
-
Provide primary operational support and engineering for multiple large-scale distributed software applications. Collaborate with development operations staff to create, monitor, and troubleshoot the system infrastructure.
-
Increase system resilience and serve larger customer volumes with expert-level coding, bulletproof release, and change management skills. Improve automation and increase the system’s self-healing capability.
-
Collect operating system data and report performance metrics to stakeholders. Manage cloud and database system maintenance, debugging production issues as they arise.
-
Ensuring the effective and seamless integration of security policies and practices to DevOps workflows to reduce overall risks and deliver products and services on time.
-
Implement the E2E automated VAPT for any new or existing application. Reduce the planned deployment downtime by ensuring robust CI/CD setup by 50%.
-
MTTR (Mean time to recovery) to less than 2 hr for any major issues. MTTD (Mean time to detect) to less than 5 min with help of automated tools & methods.
Your role will also involve:
- Collaborating with product development and feature teams to understand the upcoming product, enabling continuous integration and continuous deployment to occur
- Regularly attending the feature teams’ refinement and planning sessions
- Identifying areas for service improvement by analysing and diagnosing re-occurring platform and service incidents, as well as customer and stakeholder feedback
- Building a culture of continuous improvement to reinforce the robustness of the domain, with a focus on automation, scalability, continuous integration and continuous delivery
The skills you'll need
We’re looking for someone with technical knowledge and experience including platform, technology, products and domains.
- You'll have 12+ years of strong experience in DevSecOps & SRE experience in production support.
- Proven experience in managing large-scale distributed systems and understanding the principles of scalability and reliability.
- Ownership of DevOps DORA metrics, SRE TOIL reduction – with automation.
- Experience in security tools like SAST, DAST, container security, understanding of Node.js, React.js, JAVA, Oracle, IDMC, experience in Infra as Code like Terraform, CloudFormation.
- Experience in container technologies like Docker, Kubernetes, OpenShift. Must have knowledge of DevSecOps tools like Git, Maven, Selenium, Jenkins, Ansible, Security Tool
- Anyone of the Monitoring tools knowledge Geneos, Nagios, Prometheus, DynaTrace, AppDynamics, DX-APM, SPLUNK.
- Scripting Knowledge: UNIX Shell, (Python groovy, YAML ((good to have)).
- Experience and understanding in at least one cloud provider like AWS, Azure etc. On demand Infra provisioning – environment spinoffs – environment cloning – EKS, IAAC
- Working hands-on knowledge of configuring SLA, SLO, SLIs and infra + business rules/logics in AppDynamics, AWS CW, PingDom, DataDog, Tivoli etc (APM – preferably).
- Understanding network protocols, load balancing, and firewall management for secure and efficient network operations.
Hours
45
Job Posting Closing Date:
20/04/2025
NatWest Group Chennai, Tamil Nadu, IND Office
Kosmo One, Plot No 14 3rd Main Road, Ambattur Industrial Estate, Chennai, Tamil Nadu, India, 600 058