Trimble Logo

Trimble

Site Reliability Engineer

Job Posted Yesterday Posted Yesterday
Be an Early Applicant
Chennai, Tamil Nadu
Mid level
Chennai, Tamil Nadu
Mid level
The Site Reliability Engineer will enhance the ERP product using AI MLOps principles, ensuring efficient deployment, security, and cloud cost optimization while collaborating globally.
The summary above was generated by AI

Job Summary

We are seeking an experienced Site Reliability Engineer with AI MLOps to support the development and optimization of our ERP product, primarily in Azure and Windows environments. This role combines MLOps expertise with Site Reliability Engineering (SRE) principles to ensure the reliable, scalable, and cost-efficient deployment of AI models. The ideal candidate will focus on improving security, compliance, and operational efficiency, collaborating with North American and global teams to meet business objectives.

Key Responsibilities

  • AI MLOps Pipeline: Build and optimize CI/CD pipelines to automate the training, testing, and deployment of AI models on Azure, with a strong emphasis on improving efficiency and reducing costs.

  • Azure Infrastructure Management: Manage and maintain scalable, secure infrastructure using Azure services like Azure Machine Learning, AKS, and Virtual Machines. Continuously optimize resource usage and implement cost-saving measures.

  • Windows Server Management: Oversee Windows-based servers hosted on Azure, ensuring they meet performance, security, and compliance requirements, while also identifying and executing cost-saving opportunities.

  • Cost Optimization: Analyze and manage infrastructure costs by identifying unused or underused resources and implementing optimization strategies to drive cost savings.

  • Monitoring & Performance Optimization: Monitor the health, performance, and costs of AI models and services using Azure Monitor, NewRelic and other tools. Identify performance bottlenecks and optimize for both operational efficiency and cost reduction.

  • Model Versioning & Governance: Assist in managing model version control, governance, and lifecycle processes with a focus on cost-effective operations.

  • Cross-functional Collaboration: Collaborate with data scientists, AI engineers, and software developers to support the efficient deployment and operationalization of AI models, while actively seeking ways to minimize costs.

  • Incident Management & Automation: Participate in incident resolution and automate tasks to reduce manual work, improve system reliability, and lower operational overhead.

  • Security & Compliance Assurance: Ensure AI/ML workloads comply with security and regulatory standards, implementing cost-efficient solutions to enhance security and data protection.

Qualifications

  • Experience: 2 –5 years in MLOps, SRE, or similar roles, focusing on Azure and Windows environments.

  • Cloud Skills: Proficient in Azure services, managing infrastructure, and Windows workloads.

  • SRE Knowledge: Familiar with Site Reliability Engineering principles like monitoring and automation.

  • DevOps: Hands-on experience with CI/CD tools like Azure DevOps.

  • Scripting: Skilled in PowerShell and Python for automation.

  • Containers: Knowledge of Docker and Kubernetes for deploying AI/ML applications.

  • Windows Admin: Strong experience managing Windows Servers and related services.

  • AI/ML Knowledge: Understanding of AI/ML workflows and model deployment.

Nice-to-Have

  • Experience with Infrastructure-as-Code tools like Terraform.

  • Azure certifications (e.g., Azure AI Engineer, Azure DevOps Engineer)

  • Experience implementing cost-saving strategies in cloud environments

Soft Skills

  • Strong problem-solving skills with the ability to troubleshoot complex issues.

  • Excellent communication skills and the ability to collaborate effectively with cross-functional teams.

  • A passion for innovation and continuous improvement in AI/ML systems.

Top Skills

Aks
Azure
Azure Devops
Azure Machine Learning
Ci/Cd
Docker
Kubernetes
Newrelic
Powershell
Python
Terraform
Virtual Machines
Windows

Trimble Chennai, Tamil Nadu, IND Office

Rajiv Gandhi Street, Chennai, Tamil Nadu, India, 600113

Trimble Tharamani, Tamil Nadu, IND Office

No. 4 Rajiv Gandhi Salai, , Tharamani, Chennai, India, 600 113,

Similar Jobs

Yesterday
Chennai, Tamil Nadu, IND
Junior
Junior
Hardware • Information Technology • Other • Software • Analytics
Join the team as a Site Reliability Engineer II, focusing on cloud operations, automation, monitoring, and incident response while collaborating with software engineering teams.
Top Skills: Amazon Web ServicesAnsibleAzure DevopsCloudFormationDatadogGitGradleGrafanaInfluxdbJenkinsLinuxNoSQLPackerPagerdutyPythonSQLSumologicTerraformUnix
2 Hours Ago
Chennai, Tamil Nadu, IND
Senior level
Senior level
Healthtech • Information Technology • Telehealth
Lead and mentor a Site Reliability Engineering team, ensuring system reliability, driving automation efforts, and collaborating with cross-functional teams to optimize performance and scalability.
Top Skills: AnsibleAWSCloudwatchCrossplaneDockerElkGoGrafanaKubernetesLinuxPrometheusPuppetPythonRubySplunkTerraformVMware
8 Days Ago
Chennai, Tamil Nadu, IND
Entry level
Entry level
Fintech • Payments
The Site Reliability Engineer will work on software development, focusing on reliability, performance, and compliance while collaborating with experienced engineers.
Top Skills: Cloud ComputingIncident ResponseObservabilityOperational ExcellenceSoftware Development

What you need to know about the Chennai Tech Scene

To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.
By clicking Apply you agree to share your profile information with the hiring company.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account