The Data & AI Operations Specialist leads technical operations for AI infrastructure, manages data pipelines, and oversees MLOps across multi-cloud environments, ensuring compliance and performance optimization.
The Data & Operations AI Specialist serves as the Level 3 technical lead for Artificial Intelligence and Data Platform estate. You will be responsible for the architecture, engineering, and advanced troubleshooting of AI infrastructure, data pipelines, and MLOps lifecycles across a multi-cloud environment (Azure and OCI).
Responsibilities:
AI Infrastructure & Platform Engineering
- Design & Architecture: Maintain the monitoring architecture for AI/ML platforms and configure advanced dashboards in Grafana and Azure Monitor.
- Environment Governance: Manage Azure Machine Learning (AML) workspace configurations, compute targets, and Databricks cluster lifecycles (including runtime versions and platform patching).
- Resource Optimization: Oversee GPU resource allocation, reserved capacity, and cost-performance optimization to align with FinOps goals.
- Security Integration: Ensure all AI services utilize private endpoints, VNET integration, and RBAC controls to protect sensitive citizen data.
Data Pipeline & ETL Management
- Pipeline Engineering: Own the design, optimization, and remediation of Azure Data Factory (ADF) and Synapse pipelines.
- Advanced Troubleshooting: Resolve complex bottlenecks related to authentication failures, data format changes, and ETL performance.
- SOP Leadership: Author step-by-step Standard Operating Procedures (SOPs) for the L1 NOC team to handle routine monitoring and first-line triage.
MLOps & Model Lifecycle
- Automation: Implement CI/CD pipelines for model training, testing, and deployment to AML endpoints.
- Model Reliability: Configure data drift detection thresholds and automated retraining triggers.
- Recovery Operations: Develop self-healing scripts and automated recovery runbooks for critical AI workflows.
Governance & Compliance
- Audit Management: Implement and maintain audit logging for all AI decisions and model outputs, ensuring logs flow to the SIEM/vSOC.
- Regulatory Alignment: Conduct quarterly AI governance reviews to ensure compliance with NESA standards and data privacy guidelines.
Requirements
- AI/ML Platforms: Deep expertise in Azure Machine Learning and Databricks.
- Data Integration: Proficiency in Azure Data Factory and Synapse.
- Infrastructure-as-Code (IaC): Experience with Terraform or ARM Templates for reproducible deployments.
- Observability: Ability to use Dynatrace, Grafana, and Azure Monitor for deep-tier diagnostics.
- Containerization: Knowledge of AKS, Istio Service Mesh, and KEDA.
- ITIL Mastery: Strong understanding of ITIL-aligned Incident, Change, and Problem management.
- Security Mindset: Familiarity with NESA standards and UAE data residency requirements.
- Technical Writing: Ability to draft complex SOPs and Root Cause Analysis (RCA) documents within 48 hours of an incident.
- Certifications: Microsoft Azure Data Scientist Associate or Azure AI Engineer Associate is highly preferred.
Top Skills
Aks
Arm Templates
Azure Data Factory
Azure Machine Learning
Azure Monitor
Databricks
Dynatrace
Grafana
Istio Service Mesh
Keda
Synapse
Terraform
Similar Jobs
Cloud • Fintech • Information Technology • Machine Learning • Software • App development • Generative AI
As a Software Engineer, you will contribute to software development, driving quality releases, and ensuring adherence to SDLC processes, primarily using Python and React.
Top Skills:
AWSAzureDynamoDBGCPMongoDBMySQLPostgresPythonReact
Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
As a Business Analyst, you'll lead digital transformation projects in banking, ensuring requirements align with product goals through strong stakeholder collaboration and documentation processes.
Top Skills:
JIRA
Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
Lead the delivery of complex digital transformation programs in mobile banking. Manage customer journeys, engage stakeholders, mitigate risks, and operate within Agile frameworks.
Top Skills:
AgileSafeScrum
What you need to know about the Chennai Tech Scene
To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.


