As a Hardware Systems Engineer, you'll troubleshoot and maintain Cloudflare's server fleet, validate firmware updates, and enhance automation tools.
Available Locations: Bengaluru
About the department
Cloudflare's Infrastructure group is responsible for building our global network. Our Hardware Engineering team helps research, develop, test, and deploy new equipment enabling 20% of the world's internet traffic to be served smoothly. Deployed across 330 cities in 120+ countries, the hardware we select helps improve the security, reliability, and performance of the Internet.
About the Role
We need to make thoughtful infrastructure choices affecting a significant portion of the Internet. Hardware we work with includes servers and components, as well as PDUs and network hardware. . As a Hardware Systems Engineer, you will work with colleagues on the Hardware Engineering, Product teams, and Hardware Sourcing teams to troubleshoot and maintain Cloudflare's worldwide fleet of storage and compute servers.
What you'll do
Examples of desirable skills, knowledge and experience
Bonus Points
About the department
Cloudflare's Infrastructure group is responsible for building our global network. Our Hardware Engineering team helps research, develop, test, and deploy new equipment enabling 20% of the world's internet traffic to be served smoothly. Deployed across 330 cities in 120+ countries, the hardware we select helps improve the security, reliability, and performance of the Internet.
About the Role
We need to make thoughtful infrastructure choices affecting a significant portion of the Internet. Hardware we work with includes servers and components, as well as PDUs and network hardware. . As a Hardware Systems Engineer, you will work with colleagues on the Hardware Engineering, Product teams, and Hardware Sourcing teams to troubleshoot and maintain Cloudflare's worldwide fleet of storage and compute servers.
What you'll do
- Work with software teams to validate bug fixes and assess performance of new firmware revisions
- Validate and deploy firmware updates to the fleet, monitoring the progress of the rollout for compliance and reliability
- Work with server and component vendors to obtain, debug, and maintain the latest updates
- Work with our Site Reliability Engineering teams to triage hardware problem reports
- Support our Data Centre Engineering teams in resolving hardware issues
- Develop and maintain automation tools to update firmware on servers and components in Cloudflare's fleet
- Communicate your results and updates through blog posts, internal talks, and tickets
Examples of desirable skills, knowledge and experience
- Bachelor's degree in Computer Engineering, Electrical Engineering, or Computer Science
- Desire to learn about the Cloudflare hardware used by 20% of all web sites
- Desire to learn how a diverse server fleet is managed at scale
- Desire to learn the tools Cloudflare uses to maintain and monitor our hardware
- Knowledge of bash and python and basic Linux task automation
- Knowledge of x86 server hardware including motherboards, CPUs, memory, storage and firmware updates. Knowledge of other platforms such as arm is a bonus.
- Knowledge of configuration management principals, in particular we use salt to manage our fleet
- Knowledge of Redfish, IPMI and server remote management protocols
- Knowledge of running production mission critical systems
Bonus Points
- Familiarity with server hardware architecture
- Knowledge of debugging server hardware faults and the ability to engage with our sourcing team and vendors to improve quality
- Experience of managing large fleets comprising of thousands of servers
- Experience of observability and monitoring tools such as Prometheus and Grafana, and the ability to observe trends over time
- Experience with software development tools and processes such as git, Bitbucket and TeamCity and Jira
Top Skills
Bash
Bitbucket
Git
Grafana
Ipmi
JIRA
Linux
Prometheus
Python
Redfish
Salt
Teamcity
X86 Server Hardware
Similar Jobs at Cloudflare
Cloud • Information Technology • Security • Software • Cybersecurity
The role involves leading HR service delivery, managing payroll operations, ensuring data integrity, and optimizing HR processes while providing excellent internal customer support.
Top Skills:
WorkdayZendesk
Cloud • Information Technology • Security • Software • Cybersecurity
As an IAM Security Analyst, manage user identities and access, enforce access policies, ensure compliance, and provide operational support for IAM systems.
Top Skills:
AbacIga SolutionsPamPbacRbac
Cloud • Information Technology • Security • Software • Cybersecurity
As a Software Engineer, you will develop features for Cloudflare One's Zero Trust security platform, focusing on scalable software systems.
Top Skills:
ClickhouseElasticsearchGoGrafanaKafkaKibanaPostgresPrometheusPythonReactRedisRustTimescaledbTypescript
What you need to know about the Chennai Tech Scene
To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.