Design and optimize LLM systems, manage scalable infrastructure, implement CI/CD and automation, and ensure system reliability and compliance.
Company Description
👋🏼We're Nagarro
We are a Digital Product Engineering company that is scaling in a big way! We build products, services, and experiences that inspire, excite, and delight. We work at scale across all devices and digital mediums, and our people exist everywhere in the world (17500 experts across 36 countries, to be exact). Our work culture is dynamic and non-hierarchical. We are looking for great new colleagues. That is where you come in!
Job DescriptionREQUIREMENTS:
- Experience : 7.5+ Years
- 10-12 years in infrastructure, platform, DevOps, or MLOps roles
- Strong experience with cloud platforms (AWS/GCP/Azure) and Kubernetes
- Hands-on experience deploying and operating LLMs (OpenAI, Anthropic, open-source models)
- Proficiency with GPU infrastructure, model serving frameworks, and vector databases
- Strong programming skills in Python; experience with Bash/Go is a plus
- Experience with monitoring, logging, and performance tuning for distributed systems
- Preferred Qualifications
- Experience with LLM fine-tuning, RAG pipelines, and prompt/version management
- Familiarity with tools like Terraform, Helm, Argo, Ray, or similar
- Exposure to cost optimization strategies for large-scale AI systems
Responsibilities:
- Design and manage scalable infrastructure for training, fine-tuning, serving, and monitoring LLMs
- Build and maintain LLMOps pipelines (deployment, versioning, rollback, monitoring, evaluation)
- Optimize inference performance (latency, throughput, cost) across GPU/accelerator stacks
- Implement CI/CD, IaC, and automation for AI/ML workloads
- Ensure observability, reliability, and governance of LLM systems in production
- Collaborate with ML, platform, and product teams to operationalize AI solutions
- Manage security, compliance, and access control for model and data pipelines
Bachelor’s or master’s degree in computer science, Information Technology, or a related field.
Top Skills
Aws,Gcp,Azure,Kubernetes,Python,Bash,Go,Tensorflow,Pytorch,Terraform,Helm,Argo,Ray
Nagarro Chennai, Tamil Nadu, IND Office
AWFIS, 111, Rajiv Gandhi Road, Old Mahabalipuram Road, Kottiwakkam Village, OMR India, Chennai, India, 600041
Similar Jobs
Artificial Intelligence • Information Technology • Machine Learning • Software • Virtual Reality • Analytics
The Senior Staff Engineer will lead AI platform development, design reusable frameworks, and manage MLOps for ML workloads, ensuring engineering quality and mentoring team members.
Top Skills:
AzureCompass Ai ServicesGenaiMachine LearningMlopsPython
Artificial Intelligence • Information Technology • Machine Learning • Software • Virtual Reality • Analytics
Lead the architecture and implementation of Salesforce Consumer Goods Cloud solutions, ensuring best practices for trade promotions and integration with ERP systems while overseeing data governance and analytics.
Top Skills:
ApexAPIsCRMEinstein AnalyticsErpLwcMicrosoft D365OracleSalesforceSAPTableau
Artificial Intelligence • Information Technology • Machine Learning • Software • Virtual Reality • Analytics
Seeking a Senior Staff Engineer with a focus on leadership and mentorship, capable of addressing complex client challenges and enhancing team capabilities.
What you need to know about the Chennai Tech Scene
To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.
