Xero Logo

Xero

Lead Site Reliability Engineer (Technical Duty Officer)

Posted 21 Days Ago
Be an Early Applicant
Remote or Hybrid
5 Locations
Senior level
Remote or Hybrid
5 Locations
Senior level
Lead a team focused on incident and problem management, driving consistent and effective responses to high-severity incidents through strong technical leadership and communication.
The summary above was generated by AI
Our Purpose 

At Xero, we’re here to help you supercharge your business. We do this by automating routine tasks, surfacing actionable insights and connecting businesses with the right data, advisors and apps. When that happens, we’re not only making life better for small business, we’ll be building a stronger economy that can change the world.

About the team

Xero’s Incident and Problem Management team are a part of the Site Reliability Engineering (SRE) organization and are responsible for the build, delivery and ongoing maintenance of robust process and tooling around Incident management.

The team is responsible for driving enduring reliability at Xero through robust, consistent and fast response to high severity incidents. They are responsible for building a world class process and ensuring that process matures as the demands of the business grows. 

About the roles

We're looking for a Lead Engineer to join Xero’s Incident and Problem Management team. This position requires an experienced SRE professional with a strong technical background, deep experience in SRE, a passion for building and delivering robust processes, and extensive experience of leading technical response to high severity cloud issues. 

You will drive best practice across the business and contribute to the ongoing transformation of the Xero SRE culture. As an expert communicator, you will lead technical discussions to identify and track actions associated with and identified during incident situations.

Across our SRE function, we're looking for those who are keen to deep dive into causes of incidents and proactively examine the potential causes of future incidents; working with engineering teams to remove the risk of that failure scenario. Ultimately building playbooks and automation to ensure quick and effective responses. In addition, provide ongoing training across the business to ensure the process is well understood and adhered to.

This role will form the backbone of a new team, providing a Technical Duty Officer (TDO) function within the business. TDO’s are incident commanders who use SRE skillsets to drive fast mitigation and enduring resolution of impactful events.

What you'll do:

  • Own the incident management process, ensuring it drives enduring reliability across all products and services within Xero.
  • Provide expert leadership during critical outages, coordinating multiple teams to ensure streamlined decision-making and quick resolution.
  • Lead and advocate for the transformation to a world-leading SRE organization, promoting SRE principles within the Engineering Department.
  • Promote a customer-focused approach by addressing and mitigating global customer environment issues, and fostering a culture of continuous learning and technical excellence within the SRE team.
  • Develop and implement scalable process frameworks and observability strategies to ensure rapid problem diagnosis, response, and service reliability.
  • Collaborate with product teams to thoroughly analyze failures and integrate insights to improve service reliability, scalability, and operational efficiency.

What you'll bring:

  • Previous career experience as a Site Reliability Engineer, in an Operations or Engineering environment
  • Strong hands-on coding experience (preferably Python) and knowledge of software engineering best practice
  • Hands-on experience troubleshooting AWS hosted services
  • Networking knowledge and able to troubleshoot TCP/IP, SSL/TLS, DNSSEC, IPsec, and BGP issues
  • Strong communication (oral & written) skills including the ability to translate technical issues/concepts into agreed actions

Why Xero? 

Offering very generous paid leave to use however you’d like (plus statutory holidays!), dedicated paid leave to care for your physical and mental wellbeing as well as an Employee Assistance Program to access mental health care for you and your family. Health insurance, life insurance, and income protection.

We offer wellbeing and sports programmes, employee resource groups, 26 weeks of paid parental leave for primary caregivers, an Employee Share Plan, beautiful offices, flexible working, career development, and many other benefits that reflect our human value.

You’ll do the best work of your life at Xero!

Top Skills

AWS
Bgp
Dnssec
Ipsec
Python
Ssl/Tls
Tcp/Ip

Similar Jobs at Xero

18 Hours Ago
Remote or Hybrid
4 Locations
Senior level
Senior level
Cloud • Fintech • Information Technology • Machine Learning • Software
Lead and inspire engineering teams to deliver high-quality software. Drive continuous improvement and cultivate a productive environment while ensuring adherence to engineering standards.
Top Skills: AgileLeanSoftware Delivery PracticesSoftware Development Lifecycle
4 Days Ago
Remote or Hybrid
4 Locations
Mid level
Mid level
Cloud • Fintech • Information Technology • Machine Learning • Software
The Engineering Manager leads software development teams, focuses on team growth, delivery management, process improvements, and cross-functional collaboration to maintain high-quality software delivery.
Top Skills: .Net 8.Net Framework 4.8C#DynamoDBKubernetesReactS3SQL Server
7 Days Ago
Remote or Hybrid
5 Locations
Expert/Leader
Expert/Leader
Cloud • Fintech • Information Technology • Machine Learning • Software
The Principal Engineer in the Accounting Domain will design systems for scalability and performance, lead technical projects, mentor engineers, and drive engineering best practices.
Top Skills: Cloud-Native PlatformsProgramming LanguagesSoftware EngineeringSystem Architecture

What you need to know about the Chennai Tech Scene

To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account