Site Reliability Engineers (SREs) at ASAPP enhance the reliability and performance of infrastructure by automating systems, supporting product teams, and addressing production issues effectively.
At ASAPP, our mission is simple: deliver the best AI-powered customer experience—faster than anyone else. To achieve that, we’re guided by principles that shape how we think, build, and execute. We value customer obsession, purposeful speed, ownership, and a relentless focus on outcomes. We work in tight, skilled teams, prioritize clarity over complexity, and continuously evolve through curiosity, data, and craftsmanship.
We're seeking technologists and problem solvers who thrive in fast-paced environments, love collaborating with great talent, and approach every day like it’s Day 1. We're a globally diverse team with hubs in New York City, Mountain View, Latin America, and India—embracing both hybrid and remote work to bring the best minds together, wherever they are. If you're driven by continuous learning, rapid pivots, and the challenges of building in a high-growth startup, we’d love to talk. This is more than a job—it’s a journey.
Site Reliability Engineers (SREs) are responsible for the overall performance and reliability of ASAPP's infrastructure and products. SREs design and implement the tools that automate building reliable and performant systems. We emphasize building tools over manual processes. We implement, not administer. We’re obsessed with automation, not repetition. Our job is to focus on building reliable infrastructure and tools for our product teams so that they can solve customer problems and deliver new features, not reinvent platforms.
What you'll do
- Work with product engineering teams on service architecture and implementation
- Deliver Infrastructure configuration as code and automate everything
- Direct and implement monitoring and alerting systems to support rapid problem diagnosis
- Perform Root Cause Analysis and design and deliver resolutions
- Work on our Kubernetes / AWS infrastructure to support our product engineers
- Design secure and performant networking solutions in our production systems
What you'll need
- +4 years of relevant experience bringing software to production at high scale
- Participation in on-call rotation, triaging and addressing production issues
- Obsession with automation and instrumentation
- Understanding of complex systems and failure scenarios
- Excellent communication skills
- Knowledge of AWS services, containers and container management frameworks
- Familiarity with Message Bus based systems and distributed architectures
- Proficiency in Terraform , Python and/or Go
What we'd like to see
- BS or MS degree in the Computer Science field, or equivalent hands-on experience.
- Experience in product oriented environments
- Scalable distributed applications experience
Benefits
- Competitive compensation
- Stock options
- Prudent Life Insurance
- Free Lunch and Dinner
- Connectivity (mobile phone & internet) stipend
- Wellness perks
- Mac equipment
- Learning & development stipend
- Parental leave, including 6 weeks of paternity leave
ASAPP is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, disability, age, or veteran status. If you have a disability and need assistance with our employment application process, please email us at jobs@asapp.com to obtain assistance.
Top Skills
AWS
Go
Kubernetes
Python
Terraform
Similar Jobs
Agency • Digital Media • eCommerce • Professional Services • Software • Analytics • Consulting
The SRE Engineer will design and execute UAT and QA test plans, identify bugs, validate fixes, and communicate findings effectively with teams.
Top Skills:
Google SuiteGoogle WorkspaceJIRASlackTestrail
Fintech • Financial Services
The role involves tracking incidents, developing monitoring solutions using ELK Stack, implementing logging strategies, and enhancing application performance.
Top Skills:
AnsibleAWSAzureBashElk StackJavaJenkinsNode.jsPythonReact
Fintech • Payments • Financial Services
The Site Reliability Engineer will manage live production systems, automate tasks, improve incident response, and ensure application reliability.
Top Skills:
Application InsightsAWSAzure CloudCloudwatchGCPGrafanaKubernetesLog AnalyticsPrometheusSplunk
What you need to know about the Chennai Tech Scene
To locals, it's no secret that South India is leading the charge in big data infrastructure. While the environmental impact of data centers has long been a concern, emerging hubs like Chennai are favored by companies seeking ready access to renewable energy resources, which provide more sustainable and cost-effective solutions. As a result, Chennai, along with neighboring Bengaluru and Hyderabad, is poised for significant growth, with a projected 65 percent increase in data center capacity over the next decade.