We are seeking an On-site Senior Data Engineer who would be responsible for designing, building, and maintaining scalable, secure, and high-performance data infrastructure that powers analytics, AI/ML models, and enterprise applications. The role sits at the intersection of data engineering, applied machine learning support, and software systems, working closely with the Senior AI & Software Manager to translate product, AI, and business requirements into robust data pipelines and platforms.
This role is delivery-focused and impact-driven, with strong ownership of data reliability, performance, and governance across cloud and distributed environments.
Design, develop, and maintain end-to-end ETL/ELT pipelines for structured and unstructured data using Python and SQL.
Build scalable batch and near-real-time data workflows leveraging Apache Spark, Hadoop, Kafka, and Airflow.
Implement data ingestion, transformation, validation, and enrichment pipelines across multiple data sources (APIs, files, databases, streaming systems).
Ensure high data quality through automated checks, anomaly detection, and validation logic, including ML-assisted data quality monitoring.
Architect and manage cloud-based data solutions across AWS (S3, Glue, Redshift, EMR), GCP (BigQuery, Dataflow, Pub/Sub), and Azure (Data Factory).
Design and optimize data warehouses and analytical data models to support BI tools, AI workflows, and operational analytics.
Implement cost-efficient storage and compute strategies while maintaining performance and scalability.
Work closely with the Senior AI & Software Manager to prepare, structure, and optimize datasets for machine learning and predictive analytics.
Support ML pipelines by enabling feature engineering, training data generation, and inference-ready data flows.
Collaborate on integrating ML outputs into production systems and dashboards.
Ensure data pipelines align with AI model requirements for freshness, latency, and reliability.
Develop and maintain data services and APIs using FastAPI, Django REST, or Flask to expose data to applications and AI systems.
Collaborate with software engineers to integrate data pipelines into broader system architectures.
Ensure data platforms align with software engineering best practices (modularity, versioning, CI/CD readiness).
Enable downstream analytics and reporting through clean, well-modeled datasets.
Support BI and visualization tools such as Power BI and Looker by delivering optimized datasets and semantic layers.
Partner with stakeholders to translate analytical and operational needs into technical data requirements.
Implement data governance standards, access controls, and compliance measures, particularly for sensitive or regulated datasets.
Ensure data integrity, traceability, and auditability across pipelines and storage layers.
Collaborate on defining data documentation, lineage, and metadata practices.
Act as a senior technical partner to the Senior AI & Software Manager, contributing to architectural decisions and system design discussions.
Collaborate with data scientists, AI engineers, software developers, and non-technical stakeholders.
Provide technical guidance and mentorship to junior data engineers or analysts when required.
Participate in planning, estimation, and delivery of complex data-driven projects.
Requirements
Strong proficiency in Python and SQL for data engineering and analytics.
Hands-on experience with Apache Spark, Hadoop, Kafka, and Airflow.
Solid understanding of ETL/ELT design patterns, data modeling, and warehousing.
Experience with cloud data platforms (AWS, GCP, Azure).
Familiarity with machine learning workflows, including data preparation and feature engineering.
Experience building APIs and services using FastAPI, Django REST, or Flask.
Working knowledge of Docker, Kubernetes, and Git.
Experience supporting BI tools such as Power BI or Looker.
Proven experience delivering large-scale, production-grade data systems.
Experience working on multi-stakeholder, high-impact projects, including government or enterprise environments.
Demonstrated ability to reduce processing time, improve data quality, and scale data operations.
Track record of translating business or AI requirements into reliable technical solutions.
Degree in Engineering, Computer Science, Statistics, or a related technical field.
Formal training or certification in Data Science, Big Data, or Machine Learning is a strong advantage.



