NVIDIA Logo

NVIDIA

Principal Staff SRE - Core Infrastructure

Posted 8 Days Ago
Be an Early Applicant
In-Office
Bengaluru, Bengaluru Urban, Karnataka
Senior level
In-Office
Bengaluru, Bengaluru Urban, Karnataka
Senior level
The Principal Staff SRE will lead initiatives in transforming IT compute architecture, design and deploy core services, and optimize performance through data analysis and collaboration with various teams.
The summary above was generated by AI

NVIDIA has been reinventing computer graphics, PC gaming, and accelerated computing for 30 years. It is a unique legacy of innovation that’s fueled by great technology and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, generative AI, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work .

We are seeking a highly skilled Principal Staff SRE  to join our dynamic team. Our company is at the forefront of technological innovation, and we are dedicated to driving efficiency and optimizing the performance of our infrastructure both on-prem and cloud. Join us in this exciting endeavor!

What You Will Be Doing:

  • Lead initiatives to transform IT Compute Core Team,  architecture to build new service offerings across On-Prem and Cloud 

  • You will design, scale, and deploy core infrastructure services including DNS, NTP/PTP, DHCP, and LDAP. This includes building for performance and reliability at global scale, covering automation, monitoring, high availability, capacity planning, and lifecycle management.

  • Define and implement metrics to measure the efficiency of  services and drive efficiency with software and hardware optimizations (SR-IOV/ DPU)

  • Experience with Technologies like eBPF and XDP for Observability &  DDoS mitigation  

  • Collect and review system data for capacity and planning purposes, analyze capacity data and develop plans for appropriate level enterprise-wide systems, and coordinate with management personnel in implementing changes.

  • Develop and maintain tools for collecting, analyzing, and visualizing data for reporting, alerting, monitoring.

  • Collaborate with NVIDIA leadership, senior engineers, program managers, and product managers to develop compelling IT products and services that meet customer needs.
     

What We Need To See:

  • Bachelor’s degree in Engineering, Computer Science, Mathematics, or related field, or equivalent experience

  • 12+ years of proven experience in compute platform engineering with a focus on automation.

  • Experience in designing and deploying Containerization  architectures and Distributed Systems Infrastructure 

  • Proven experience evaluating existing application architectures and identify opportunities for containerization to improve scalability, reliability, and efficiency.

  • Strong analytical skills with the ability to define and track key performance metrics.

  • Experience in developing tools for data analysis and performance profiling, Development with Terraform, Config Management tools.

  • Proficiency in programming languages such as Go and/or Python.

  • Linux OS Proficiency with Kernel Internals 

  • Experience with running large environments consisting of BareMetal Build Infrastructure 

  • Understanding of Network Protocols and Architectures  (VLAN/VxLAN/SDN/BGP/Anycast)
     

Ways To Stand Out From The Crowd:

  • Deep understanding of other infrastructure components like, DNS, LDAP, Security Tools etc..

  • Hands-on experience with containers and its implementation 

  • Deploying and Managing Services like DNS , LDAP at scale 

  • Solid understanding of microservices architecture, infrastructure as code (IaC) and configuration management tools.

Top Skills

Ebpf
Go
Linux Os
Python
Terraform
Xdp

Similar Jobs

52 Minutes Ago
In-Office or Remote
2 Locations
Senior level
Senior level
Artificial Intelligence • Fintech • Payments • Financial Services • Generative AI
Manage people operations and partnering, supporting employee lifecycle, ensuring compliance with employment laws, and fostering community in the workplace.
Top Skills: Google SuiteHris
53 Minutes Ago
Hybrid
Bengaluru, Bengaluru Urban, Karnataka, IND
Mid level
Mid level
Artificial Intelligence • Big Data • Information Technology • Software
The IT Support Expert will guide product teams, champion IT service transformation using AI, supervise AI agents, and innovate IT support processes.
Top Skills: Artificial IntelligenceIt Support
54 Minutes Ago
In-Office
Bangalore, Bengaluru Urban, Karnataka, IND
Expert/Leader
Expert/Leader
Big Data • Cloud • Fintech • Financial Services • Conversational AI
Oversee and manage Preqin's Fund Manager data sets, ensuring accuracy and comprehensiveness while leading a high-performing team to improve data quality processes.
Top Skills: MetabasePower BI

What you need to know about the Chennai Tech Scene

To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account