Human Archive is a research lab backed by Y Combinator focused on modeling human embodied intelligence.
Humans are the most sophisticated biological systems we have ever observed, yet we still do not fully understand ourselves. Research into human physical intelligence — including the human hand, proprioception, and vision — remains largely unsolved. Our mission is to recover human embodied intelligence as a learned model. To achieve this, we build custom hardware products, deploy them globally at scale, and publish research. Today, our data is used for robotics and world modeling, but the broader opportunity is advancing scientific research into intelligence itself.
Founded by Stanford and UC Berkeley researchers, we are lean, deeply technical, and operate at extreme speed, taking on unglamorous and conventionally impossible problems that directly unlock step-function gains in model capability.
The deployment of capable humanoids at scale will permanently redefine human labor. Undesirable physical work will disappear, and human effort will shift toward a new era of abundant creativity.
We are building the infrastructure to accelerate that transition by assembling the Human Archive mafia. You will own meaningful systems from day one and see your work directly impact model capabilities. This is a once-in-a-generation inflection point. If you want to help reshape physical labor and work on problems that matter at civilizational scale, join us.
The Opportunity
As an Infrastructure Engineer at Human Archive, you will build and optimize the storage and offloading infrastructure powering PB-scale multimodal robotics data collection used to model human embodied intelligence.
This is a hands-on role focused on high-throughput data movement, storage systems, AWS optimization, realtime offloading, and distributed infrastructure reliability. You’ll work across multimodal sensor streams, server infrastructure, GPU compute, and large-scale data pipelines while improving throughput, storage efficiency, and deployment reliability across global operations.
Your work will shape how frontier labs and leading robotics companies train their models, transforming physical labor markets and economies while contributing to broader research into human embodied intelligence.
What You’ll Do
Build PB-scale storage and offloading infrastructure for multimodal robotics data
Design high-throughput upload and distributed storage systems
Optimize AWS costs, networking performance, and compute efficiency
Manage servers, GPU nodes, and deployment infrastructure
Build reliable ingestion and retrieval pipelines
Identify and eliminate infrastructure bottlenecks
Improve infrastructure speed, reliability, and operational efficiency
Prototype quickly and iterate from deployment feedback
The Opportunity
5+ years of experience in infrastructure or large-scale data engineering
Strong experience with AWS, networking, storage systems, and distributed compute
Experience building high-throughput data pipelines
Strong proficiency in Python, Go, Rust, C++, or similar languages
Strong systems thinking focused on speed and cost optimization
Experience working with PB-scale datasets or multimodal infrastructure is a strong plus
Bangalore-based or willing to relocate

