Design and operate software for Cloudflare's observability, improve Metrics & Alerting, and work on scalable systems while mentoring others.
Available Locations: London or Lisbon
About the Department
Production Engineering is responsible for the world's most reliable, observable, performant, and safe network ecosystem. Our customers rely on our products and systems to safely modify, troubleshoot, and release products without external impact.
Our external customers rely on us to provide seamless and predictable incident, traffic, policy management, resulting in the fastest and safest network services in the world.
We are accountable for the overall performance of internal and external facing services, guiding our product teams to optimal configurations and maximum efficiency. From the moment that a packet enters the Cloudflare ecosystem, we know exactly what its expected purpose and behaviour is and we are capable of determining and exposing anomalous behaviour.
The Cloudflare network makes it possible to solve challenges at massive scale and efficiency which would be impossible for almost any other organization.
In this role, you can expect to:
We are a small team, well-funded, growing and focused on building an extraordinary company. This is a systems engineering role and is a superb opportunity to be part of a high performing team to help to support Cloudflare's mission and help build a better internet.
You may be a good fit for our team if you have:
Bonus points if you have:
About the Department
Production Engineering is responsible for the world's most reliable, observable, performant, and safe network ecosystem. Our customers rely on our products and systems to safely modify, troubleshoot, and release products without external impact.
Our external customers rely on us to provide seamless and predictable incident, traffic, policy management, resulting in the fastest and safest network services in the world.
We are accountable for the overall performance of internal and external facing services, guiding our product teams to optimal configurations and maximum efficiency. From the moment that a packet enters the Cloudflare ecosystem, we know exactly what its expected purpose and behaviour is and we are capable of determining and exposing anomalous behaviour.
The Cloudflare network makes it possible to solve challenges at massive scale and efficiency which would be impossible for almost any other organization.
In this role, you can expect to:
- Design, deliver, and operate software that progresses Cloudflare's Observability competency
- Solve scaling bottlenecks in critical services in our Metrics & Alerting pipeline
- Work on highly distributed and scalable systems
- Participate in the constant cycle of knowledge sharing and mentoring
- Participate in the global on-call rotation for the services your team owns
- Research and introduce cutting-edge technologies
- Contribute to open-source
We are a small team, well-funded, growing and focused on building an extraordinary company. This is a systems engineering role and is a superb opportunity to be part of a high performing team to help to support Cloudflare's mission and help build a better internet.
You may be a good fit for our team if you have:
- Proficiency in distributed Linux environments
- Proficiency in designing high-scale distributed systems
- Proficiency in high-level programming languages (e.g., Golang)
- Proficiency in Prometheus, Alertmanager, Thanos
- Proficiency in networking protocols Layer 2-7 of the OSI model
- Experience working in a fast, high-growth environment
- Experience working in a 24/7/365 service environment
- Exquisite written and verbal communication skills
- Familiarity with Internetworking and BGP
- Strong bias for action
Bonus points if you have:
- Experience with high-bandwidth transit Internetworking and routing
- Passion for code simplicity and performance
Top Skills
Alertmanager
Bgp
Go
Linux
Prometheus
Thanos
Similar Jobs at Cloudflare
Cloud • Information Technology • Security • Software • Cybersecurity
Join Cloudflare's Network Engineering Team to develop software solutions for network resilience and operational efficiency, focusing on distributed systems and automation.
Top Skills:
Ci/CdContainersGoLinuxMySQLPostgresPythonUnixVirtualization
Cloud • Information Technology • Security • Software • Cybersecurity
As a Senior Data Engineer, you will design and scale data platforms, develop robust data architecture, ensure data quality, and collaborate with teams to support business goals.
Top Skills:
AirflowSparkDockerGoGoogle Cloud PlatformKafkaKubernetesPythonScala
Cloud • Information Technology • Security • Software • Cybersecurity
As a Russian-speaking Solutions Engineer, you will advocate for customers, collaborating with multiple teams to provide scalable technical solutions, ensuring customer success with Cloudflare's offerings.
Top Skills:
BashCdnDdosDeveloper PlatformDnsHTTPJavaScriptPythonSaseTcpUdpZero-Trust
What you need to know about the Chennai Tech Scene
To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.