NatWest Group Logo

NatWest Group

Site Reliability Engineer, AVP

Job Posted 3 Days Ago Posted 3 Days Ago
Be an Early Applicant
3 Locations
Mid level
3 Locations
Mid level
As a Site Reliability Engineer, you will manage production services, automate operational tasks, implement monitoring solutions, and ensure system resilience while minimizing disruption to customer journeys.
The summary above was generated by AI

Join us as a Site Reliability Engineer

  • You’ll be managing the provision of stable, resilient, reliable applications with the end goal of minimising disruption to Customer & Colleague Journeys (CCJ)
  • We’ll look to you to identify and automate manual tasks and implement observability solutions, ensuring a thorough understanding of CCJ across applications
  • This is a great chance to work in a supportive environment with opportunities to advance your personal and career development
  • We're offering this role at associate vice president level

What you'll do

As a Site Reliability Engineer, you’ll collaborate with feature teams to understand application changes, participate in delivery activities, and address production issues to assist in the delivery of change that does not negatively affect the customer experience. You’ll also help to monitor and manage cloud costs, recommending optimisations and cost-saving measures.

You’ll be responding to, managing, and resolving incidents in a timely manner, performing root cause analysis and driving improvements to prevent recurrence. As well as this, you’ll automate routine operational tasks and cloud infrastructure provisioning using IaC tools.

You’ll also be:

  • Conducting capacity planning exercises to make sure cloud resources can handle anticipated traffic spikes and growth
  • Implementing and maintaining monitoring, logging, and alerting systems to provide insights into cloud infrastructure and applications' health and performance
  • Delivering automation solutions to minimise and eliminate manual tasks associated with maintaining and supporting the applications
  • Ensuring an in-depth understanding of the full tech stack on which the application resides and depends on
  • Identifying alerting and monitoring requirements for an application, based on sound understanding of customer journeys
  • Evaluating the resilience of the end-to-end tech stack on which the applications depend, and addressing weaknesses
  • Seeking to reduce frequency of hand-offs in the end-to-end resolution of customer-impacting incidents

The skills you'll need

To succeed in this role, you’ll need experience of supporting live production services serving customer journeys with a demonstrable knowledge of ITIL processes and IT Security principles along with tools and techniques to prevent compliance breaches.

On top of this, you’ll bring hands on experience with Azure Cloud and full-stack observability using tools such as Log Analytics, Application Insights, Grafana, CloudWatch, Prometheus and Splunk.

You’ll also need:

  • Strong verbal and written communication skills
  • Strong hands on experience with cloud platforms including AWS and GCP, and their services such as S3, Lambda and Kubernetes
  • Experience of managing production systems and incidents with a focus on minimising downtime and improving system resilience
  • Strong troubleshooting skills for cloud infrastructure and application performance issues
  • Experience of networking in the cloud and familiarity with Chaos Engineering principles and tools

Hours

45

Job Posting Closing Date:

16/04/2025

Top Skills

Application Insights
AWS
Azure Cloud
Cloudwatch
GCP
Grafana
Kubernetes
Lambda
Log Analytics
Prometheus
S3
Splunk

NatWest Group Chennai, Tamil Nadu, IND Office

Kosmo One, Plot No 14 3rd Main Road, Ambattur Industrial Estate, Chennai, Tamil Nadu, India, 600 058

Similar Jobs

Yesterday
Hybrid
4 Locations
Junior
Junior
Artificial Intelligence • Healthtech • Professional Services • Analytics • Consulting
Develop and implement technology solutions for clients, manage project phases, and collaborate within a global team while ensuring best practices.
Top Skills: AWSAzureCloudData WarehousingEtl Service PlatformGCPInformaticaPrograming LanguagesSQLSsisTalend
Yesterday
Gurugram, Haryana, IND
Mid level
Mid level
Information Technology • Software • Financial Services
The Site Reliability Engineer will support application migrations, diagnose problems, and manage infrastructure deployment while collaborating in a distributed environment.
Top Skills: BashPythonSQLTcp/IpUdpUnix/Linux
Yesterday
Hybrid
3 Locations
Senior level
Senior level
Artificial Intelligence • Healthtech • Professional Services • Analytics • Consulting
Lead AI Engineer responsible for developing and refining machine learning engineering platforms, building model pipelines, collaborating with teams, and ensuring high-quality code and deliverables.
Top Skills: AirflowAWSAzureGCPKubeflowMlflowPysparkPythonSagemakerScalaSQL

What you need to know about the Chennai Tech Scene

To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.
By clicking Apply you agree to share your profile information with the hiring company.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account