The AI Platform Engineer will design and optimize AI/ML infrastructure, automate pipelines, and manage Kubernetes for effective machine learning model deployment and collaboration.
AHEAD builds platforms for digital business. By weaving together advances in cloud infrastructure, automation and analytics, and software delivery, we help enterprises deliver on the promise of digital transformation.
At AHEAD, we prioritize creating a culture of belonging, where all perspectives and voices are represented, valued, respected, and heard. We create spaces to empower everyone to speak up, make change, and drive the culture at AHEAD.
We are an equal opportunity employer, and do not discriminate based on an individual's race, national origin, color, gender, gender identity, gender expression, sexual orientation, religion, age, disability, marital status, or any other protected characteristic under applicable law, whether actual or perceived.
We embrace all candidates that will contribute to the diversification and enrichment of ideas and perspectives at AHEAD.
We are seeking an experienced AI Platform Engineer to design, deploy, and optimize AI/ML infrastructure, AI workflows, and automated pipelines. This role focuses on building scalable environments for training and deploying machine learning models, leveraging modern orchestration, automation, and GPU acceleration technologies. You will collaborate with data scientists and platform engineers to drive efficient resource utilization and scalable operations across cloud and hybrid environments.
Key Responsibilities
- Kubernetes for AI/ML: Architect and manage Kubernetes clusters tailored to AI/ML workloads.
- GPU Orchestration: Implement Run:ai and operators for GPU resource orchestration and workload scheduling.
- Automation & Pipelines: Develop and maintain Python-based automation scripts and ML pipelines; automate infrastructure provisioning with Terraform and configuration management with Ansible.
- Notebooks & Collaboration: Create and manage Jupyter Notebooks for experimentation and collaboration.
- NVIDIA Integration: Integrate and optimize NVIDIA Enterprise Suite components (CUDA, NeMo Framework, Triton, TensorRT, GPU drivers) for accelerated computing.
- MLOps Practices: Establish and maintain MLOps best practices for model lifecycle management, CI/CD, and monitoring (e.g., MLflow, Kubeflow).
- Collaboration: Work closely with data scientists and platform engineers to ensure efficient resource utilization and scalability across environments.
Required Skills & Experience
- Strong proficiency in Python and experience with ML frameworks (TensorFlow, PyTorch).
- Hands-on experience with Kubernetes and container orchestration.
- Familiarity with Run:ai or similar GPU scheduling platforms.
- Expertise in Terraform and Ansible for infrastructure automation.
- Experience with Jupyter Notebooks for ML development.
- Knowledge of NVIDIA Enterprise Suite (CUDA, NeMo Framework, Triton, GPU drivers).
- Solid understanding of MLOps principles and tools (e.g., MLflow, Kubeflow).
- Background in deploying and scaling AI workloads in cloud or hybrid environments.
Qualifications
- 4+ years in platform architecture or solutions architecture, with 2+ years focused on AI/ML workloads.
- Experience with high-performance computing (HPC) environments.
- Familiarity with distributed training and model optimization techniques.
- Certification in Kubernetes or cloud platforms (AWS, Azure, GCP).
Why AHEAD:
Through our daily work and internal groups like Moving Women AHEAD and RISE AHEAD, we value and benefit from diversity of people, ideas, experience, and everything in between.
We fuel growth by stacking our office with top-notch technologies in a multi-million-dollar lab, by encouraging cross department training and development, sponsoring certifications and credentials for continued learning.
India Employment Benefits include:
Comprehensive health insurance coverage for employees, with options to extend coverage to dependents
Paid time off and company holidays, along with additional leave benefits as per policy
Flexible work arrangements, supporting work-life balance
Learning and development opportunities to support continuous growth and upskilling
Employee wellness initiatives and programs focused on physical and mental well-being
Retirement and statutory benefits in line with India regulations
Inclusive and people-first culture, with a strong focus on collaboration and ownership
Top Skills
Ansible
Cuda
Gpu Drivers
Jupyter Notebooks
Kubeflow
Kubernetes
Mlflow
Nemo Framework
Python
PyTorch
Run:Ai
TensorFlow
Terraform
Triton
Similar Jobs
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
The AI Platform Engineer will build infrastructure to evaluate and improve AI systems, focusing on observability, diagnostics, and enhancement of performance.
Top Skills:
PythonRuby On Rails
Artificial Intelligence • Cloud • Software
Build and scale Snowflake-native AI execution layers and orchestrated agent workflows. Implement RAG, embedding pipelines, vector retrieval, context-routing, monitoring, and integrations with enterprise data using SQL, Python, and DBT. Ensure scalable, observable, and governed AI systems across the organization.
Top Skills:
AiriaAnthropicCortexCrewaiDbtEmbedding PipelinesGemini)LangchainLlm Apis (OpenaiModel Context Protocol (Mcp)PythonRetrieval-Augmented Generation (Rag)SnowflakeSnowflake AiSQLVector-Based Retrieval
Artificial Intelligence • Information Technology • Software
The Lead Automation Engineer will develop and maintain automated testing frameworks for network observability, enhance product quality, and collaborate with engineering teams to improve testing strategies and practices.
Top Skills:
BgpElkGitlab CiGoGrafanaJavaJenkinsJunitPlaywrightPrometheusPytestPythonRobot FrameworkSeleniumSnmpTcp/Ip
What you need to know about the Chennai Tech Scene
To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.



