The Data Engineer will architect data lakes, curate datasets, automate data services, and develop training labs, leveraging cloud technologies and ensuring data quality for AI applications in climate and health.
The Institute for Health Modeling and Climate Solutions (IMACS) is a global center of excellence, hosted by Malaria No More, with the mission to empower the world’s most climate-vulnerable countries with the tools, data, and expertise needed to predict, prevent, and respond to climate-sensitive health threats.
IMACS is redefining how climate intelligence is operationalized in public health by building and scaling AI-powered digital public goods that integrate and model climate and health data. Through the application of machine learning, interoperable platforms, and next-generation early warning systems, IMACS enables real-time risk detection and proactive responses at scale. IMACS supports countries through co-designed implementation pathways– orchestrating data cooperation, strengthening national health and climate information systems with tailored innovations, training frontline actors and policymakers, and institutionalizing their use through clear SOPs and sustainability guidelines. By unlocking the value of climate and health data, IMACS helps transform fragmented information into strategic, actionable knowledge– enabling smarter decisions, better preparedness, and more resilient health systems in the era of climate disruption.
Backed by the Patrick J. McGovern Foundation, we are building a Central Data & Analytics Hub (CDAH) to advance IMACS’ climate health AI foundation model and related digital public goods, as well as a training program, to equip public health professionals with the knowledge and tools required to make data-informed decisions at the intersection of climate and health.
The CDAH will be a cloud-native, open-source “operating system” for integrated climate and health intelligence, built on five pillars:
- AI R&D environment:Ingests multi-modal climate, environmental, epidemiological and socio-demographic data into a unified data lake & feature store; supports Kubeflow/PyTorch/TensorFlow pipelines with MLflow registry, automated benchmarking, architecture search, transfer learning and uncertainty-aware modeling.
- Digital tool marketplace & public goods registry: User-facing portal for dashboards, mobile apps and alerting platforms; structured backend registry of pre-trained model packages, microservices, ETL scripts, governance adapters, metadata and version history.
- Systems integration & deployment layer: Middleware adapters and Kafka messaging to plug AI services into DHIS2, HMIS, IDSR and similar platforms; Terraform/Ansible IaC, identity management, end-to-end encryption and compliance with data-governance standards.
- Training environment: Web portal and virtual bootcamp infrastructure hosting open-access modules, instructor-led sessions, hands-on Jupyter labs, code templates and certification tracks on climate-health AI workflows and interoperability.
- Real-world evaluation sandbox: Controlled simulation environment replicating public-health workflows, climate variability and institutional constraints; structured feedback loops for piloting, validating and refining tools prior to full-scale rollout.
What We’re Looking For
- Architect the data backbone:Lead design of a multi-tenant data lake & feature store; define schemas, metadata standards, and secure ETL/ELT pipelines for climate, environmental, epidemiological, and socio-demographic data.
- Source & curate open-source datasets: Identify, evaluate and onboard public climate, environmental, epidemiological and socio-demographic data (e.g., ERA5/ Copernicus, MODIS, WHO, UN, university repositories, open-API feeds), ensuring metadata completeness and licensing compliance for downstream model training.
- Automate data quality assurancet & governance: Build unit/integration tests and data-quality checks (Great Expectations/dbt), track lineage, and enforce access controls.
- Ingest and harmonize datasets:Operationalize ingestion, cleansing, and harmonization of ERA5, Sentinel, GPM, EHR, mobility, and demographic datasets; ensure interoperability with DHIS2/HMIS
- Automate data services: Develop reusable validation libraries, transformation scripts, and secure REST/GraphQL APIs to power downstream AI models and dashboards. Manage the data-service API contract; the AI/ML Engineer manages model APIs.
- Develop Training Labs:Author reference ETL scripts, notebooks, and architecture patterns for “AI-ready” datasets; validate that bootcamp exercises reflect real-world data challenges
- Co-lead bootcamps: Guide participants through hands-on ETL labs, troubleshoot integration issues, and refine training materials based on feedback.
- Publish open-source components: Package and release ETL modules, transformation libraries, and interoperability adapters to the public-goods registry under permissive licenses.
What We’re Looking For
- Deep technical expertise: 8+ years in data engineering, with a strong track record designing and operating large-scale data lakes and pipelines.Demonstrated experience discovering, evaluating and integrating diverse open-source data streams for ML pipelines.
- DataOps & Cloud proficiency:Expertise in Python/SQL, Spark/Flink, Airflow, dbt, Kafka, Docker, Kubernetes, CI/CD (GitOps), and AWS/Azure/GCP.
- API & microservices: Proven ability to design, implement, and secure RESTful APIs and data service micro-architectures.
- Consulting acumen: Exceptional stakeholder management, technical storytelling, and client-facing presentation skills– ideally honed at a top-tier consulting firm or tech organization.
- Autonomous delivery:Demonstrated capacity to own complex projects end-to-end, navigate ambiguity, and deliver production-ready solutions with minimal oversight.
Preferred Qualifications
- Prior engagement in global health, One Health, or climate-health data initiatives.
- Familiarity with data-governance frameworks (e.g., GDPR, HIPAA) and cybersecurity best practices.
- Experience designing and delivering technical training or bootcamps.
- Contributions to open-source digital public goods or curated registries.
Why You’ll Love This Role
- High-impact mission: Your work will directly strengthen early warning systems and resilience in climate-vulnerable regions.
- Technical leadership:Own the design and delivery of the CDAH's data backbone.
- Innovation-friendly environment:Leverage cutting-edge Big Data and cloud technologies in a dynamic, open-source ecosystem.
- Global collaboration: Engage a diverse network of public-health experts, policymakers, and open-source communities.
Please submit your résumé, a brief cover letter outlining your most relevant Data projects/ consulting engagements, and links to GitHub repos or model demos.
Malaria No More is an equal opportunity employer, and all qualified applicants will be considered without regard to race, color, religion, sex, disability status, sexual orientation, gender identity, national origin, veteran status, or any other characteristic protected by law. We are committed to fostering a diverse and inclusive workplace and provide equal opportunities in all terms and conditions of employment
Top Skills
Airflow
AWS
Azure
Dbt
Docker
Flink
GCP
Kafka
Kubernetes
Python
Spark
SQL
Similar Jobs
AdTech • Cloud • Marketing Tech • Productivity • Software • Analytics • Automation
As an Associate SRE, you'll ensure the reliability of data infrastructure, develop monitoring solutions, build automation tools using Python, and collaborate with Data Engineering teams to implement best practices.
Top Skills:
AirflowAnsibleAWSAzureDagsterDatadogDockerGCPKubernetesPrefectPythonSnowflakeTerraform
Fintech • Information Technology • Insurance • Financial Services • Big Data Analytics
The Senior Data Engineer will manage ETL processes, oversee a team, and guide agile product development while ensuring efficient data integration and governance.
Top Skills:
AgileInformatica B2BInformatica DqInformatica DvoInformatica DxInformatica EdcInformatica IdmcInformatica MftInformatica PmpcInformatica PowercenterInformatica PwxInformatica TdmLinuxPythonWindows
Aerospace • Energy
The Operational Data Engineer designs and maintains data pipelines and analytical solutions to enhance operational efficiency and fuel savings for airline customers. Responsibilities include data validation, collaboration with teams, and supporting sustainability goals.
Top Skills:
AWSAzureBigQueryGCPPowershellPythonRedshiftSnowflakeSQL
What you need to know about the Chennai Tech Scene
To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.