Malaria No More

Data Engineer

Reposted 11 Days Ago

Remote

2 Locations

Senior level

Remote

2 Locations

Senior level

The Data Engineer will architect data lakes, curate datasets, automate data services, and develop training labs, leveraging cloud technologies and ensuring data quality for AI applications in climate and health.

The summary above was generated by AI

The Institute for Health Modeling and Climate Solutions (IMACS) is a global center of excellence, hosted by Malaria No More, with the mission to empower the world’s most climate-vulnerable countries with the tools, data, and expertise needed to predict, prevent, and respond to climate-sensitive health threats.

IMACS is redefining how climate intelligence is operationalized in public health by building and scaling AI-powered digital public goods that integrate and model climate and health data. Through the application of machine learning, interoperable platforms, and next-generation early warning systems, IMACS enables real-time risk detection and proactive responses at scale. IMACS supports countries through co-designed implementation pathways– orchestrating data cooperation, strengthening national health and climate information systems with tailored innovations, training frontline actors and policymakers, and institutionalizing their use through clear SOPs and sustainability guidelines. By unlocking the value of climate and health data, IMACS helps transform fragmented information into strategic, actionable knowledge– enabling smarter decisions, better preparedness, and more resilient health systems in the era of climate disruption.

Backed by the Patrick J. McGovern Foundation, we are building a Central Data & Analytics Hub (CDAH) to advance IMACS’ climate health AI foundation model and related digital public goods, as well as a training program, to equip public health professionals with the knowledge and tools required to make data-informed decisions at the intersection of climate and health.

The CDAH will be a cloud-native, open-source “operating system” for integrated climate and health intelligence, built on five pillars:

AI R&D environment:Ingests multi-modal climate, environmental, epidemiological and socio-demographic data into a unified data lake & feature store; supports Kubeflow/PyTorch/TensorFlow pipelines with MLflow registry, automated benchmarking, architecture search, transfer learning and uncertainty-aware modeling.

Digital tool marketplace & public goods registry: User-facing portal for dashboards, mobile apps and alerting platforms; structured backend registry of pre-trained model packages, microservices, ETL scripts, governance adapters, metadata and version history.

Systems integration & deployment layer: Middleware adapters and Kafka messaging to plug AI services into DHIS2, HMIS, IDSR and similar platforms; Terraform/Ansible IaC, identity management, end-to-end encryption and compliance with data-governance standards.

Training environment: Web portal and virtual bootcamp infrastructure hosting open-access modules, instructor-led sessions, hands-on Jupyter labs, code templates and certification tracks on climate-health AI workflows and interoperability.

Real-world evaluation sandbox: Controlled simulation environment replicating public-health workflows, climate variability and institutional constraints; structured feedback loops for piloting, validating and refining tools prior to full-scale rollout.

What We’re Looking For

Architect the data backbone:Lead design of a multi-tenant data lake & feature store; define schemas, metadata standards, and secure ETL/ELT pipelines for climate, environmental, epidemiological, and socio-demographic data.

Source & curate open-source datasets: Identify, evaluate and onboard public climate, environmental, epidemiological and socio-demographic data (e.g., ERA5/ Copernicus, MODIS, WHO, UN, university repositories, open-API feeds), ensuring metadata completeness and licensing compliance for downstream model training.

Automate data quality assurancet & governance: Build unit/integration tests and data-quality checks (Great Expectations/dbt), track lineage, and enforce access controls.

Ingest and harmonize datasets:Operationalize ingestion, cleansing, and harmonization of ERA5, Sentinel, GPM, EHR, mobility, and demographic datasets; ensure interoperability with DHIS2/HMIS

Automate data services: Develop reusable validation libraries, transformation scripts, and secure REST/GraphQL APIs to power downstream AI models and dashboards. Manage the data-service API contract; the AI/ML Engineer manages model APIs.

Develop Training Labs:Author reference ETL scripts, notebooks, and architecture patterns for “AI-ready” datasets; validate that bootcamp exercises reflect real-world data challenges

Co-lead bootcamps: Guide participants through hands-on ETL labs, troubleshoot integration issues, and refine training materials based on feedback.

Publish open-source components: Package and release ETL modules, transformation libraries, and interoperability adapters to the public-goods registry under permissive licenses.

What We’re Looking For

Deep technical expertise: 8+ years in data engineering, with a strong track record designing and operating large-scale data lakes and pipelines.Demonstrated experience discovering, evaluating and integrating diverse open-source data streams for ML pipelines.

DataOps & Cloud proficiency:Expertise in Python/SQL, Spark/Flink, Airflow, dbt, Kafka, Docker, Kubernetes, CI/CD (GitOps), and AWS/Azure/GCP.

API & microservices: Proven ability to design, implement, and secure RESTful APIs and data service micro-architectures.

Consulting acumen: Exceptional stakeholder management, technical storytelling, and client-facing presentation skills– ideally honed at a top-tier consulting firm or tech organization.

Autonomous delivery:Demonstrated capacity to own complex projects end-to-end, navigate ambiguity, and deliver production-ready solutions with minimal oversight.

Preferred Qualifications

Prior engagement in global health, One Health, or climate-health data initiatives.

Familiarity with data-governance frameworks (e.g., GDPR, HIPAA) and cybersecurity best practices.

Experience designing and delivering technical training or bootcamps.

Contributions to open-source digital public goods or curated registries.

Why You’ll Love This Role

High-impact mission: Your work will directly strengthen early warning systems and resilience in climate-vulnerable regions.

Technical leadership:Own the design and delivery of the CDAH's data backbone.

Innovation-friendly environment:Leverage cutting-edge Big Data and cloud technologies in a dynamic, open-source ecosystem.

Global collaboration: Engage a diverse network of public-health experts, policymakers, and open-source communities.

Please submit your résumé, a brief cover letter outlining your most relevant Data projects/ consulting engagements, and links to GitHub repos or model demos.

Malaria No More is an equal opportunity employer, and all qualified applicants will be considered without regard to race, color, religion, sex, disability status, sexual orientation, gender identity, national origin, veteran status, or any other characteristic protected by law. We are committed to fostering a diverse and inclusive workplace and provide equal opportunities in all terms and conditions of employment

Top Skills

Airflow

AWS

Azure

Dbt

Docker

Flink

GCP

Kafka

Kubernetes

Python

Spark

SQL

Similar Jobs

ServiceNow

Staff Data Engineer

6 Days Ago

Remote or Hybrid

Hyderabad, Telangana, IND

Senior level

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation

Lead the design and optimization of data workflows using Cribl and streaming technologies, manage observability data, and mentor teams.

Top Skills: Apache FlinkSparkCribl EdgeCribl StreamGoGrafanaJavaKafkaPrometheusPythonSplunkTsdbs

CrowdStrike

Senior Engineer

6 Days Ago

Remote or Hybrid

KA, IND

Senior level

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity

This role involves designing, building, and scaling a Data+ML platform, collaborating on machine learning pipelines, and incorporating cloud services for deployment and execution.

Top Skills: AirflowSparkFlinkFluxcdJavaJupyter NotebooksKubernetesMlflowPythonRayTerraformVertex Ai

BayRock Labs

Data Engineer

Yesterday

Remote

India

Senior level

Artificial Intelligence • Cloud • Information Technology • Machine Learning • Software

Design and optimize data models and pipelines within Snowflake for AI applications, ensuring data quality and efficient processing.

Top Skills: Dynamic TablesPythonSnowflakeSnowparkSnowpipeSQLStreamsTasks

What you need to know about the Chennai Tech Scene

To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.