Job Description
Are you ready to be at the forefront of technology, driving innovation and modernizing the world's most complex systems? Join the Aumni team at JPMorgan Chase as an MLOps Engineer III, where your expertise will help build and optimize our core model hosting, deployment, and monitoring infrastructure in AWS. We offer unparalleled opportunities for career growth and a collaborative environment where you can thrive and contribute to meaningful projects.
As an MLOps Engineer III at JPMorgan Chase within Aumni, your role will involve addressing complex business challenges with simple solutions. Your responsibilities will include configuring, maintaining, monitoring, and optimizing models developed by our data science teams. You will play a significant role in ensuring end-to-end operations, availability, reliability, and scalability in the AI/ML domain.
Job Responsibilities
- Guide and assist in designing and deploying new AI/ML models in the cloud, gaining consensus from peers where appropriate.
- Design and implement automated continuous integration and continuous delivery pipelines for the Data Science teams.
- Write and deploy infrastructure as code for the models and pipelines you support.
- Collaborate with technical experts, key stakeholders, and team members to resolve complex technical problems.
- Understand the importance of monitoring and observability in the AI/ML space, utilizing service level indicators and objectives.
- Proactively resolve issues before they impact stakeholders of deployed models.
- Support the adoption of MLOps best practices within your team.
Required Qualifications, Capabilities, and Skills
- Formal training or certification on MLOps concepts and 3+ years of applied experience.
- Understanding of MLOps culture and principles, with familiarity in implementing concepts at scale.
- Domain knowledge of machine learning applications and technical processes within the AWS ecosystem.
- Experience with infrastructure as code tooling such as Terraform and CloudFormation.
- Experience with container and container orchestration technologies like ECS, Kubernetes, and Docker.
- Knowledge of continuous integration and continuous delivery tools like Jenkins, GitLab, or GitHub Actions.
- Proficiency in programming languages: Python, Bash.
- Hands-on knowledge of Linux and networking internals.
- Understanding of roles served by data engineers, data scientists, machine learning engineers, and system architects, and how MLOps contributes to these workstreams.
Preferred Qualifications, Capabilities, and Skills
- Experience with model training and deployment pipelines, managing scoring endpoints.
- Familiarity with observability concepts and telemetry collection using tools like Datadog, Grafana, Prometheus, and Splunk.
- Understanding of data engineering platforms such as Databricks or Snowflake, and machine learning platforms like AWS Sagemaker.
- Comfortable troubleshooting common containerization technologies and issues.
- Ability to proactively recognize roadblocks and demonstrate interest in learning technology that facilitates innovation.
- Ability to identify new technologies and relevant solutions to ensure design constraints are met by the Data Science and Machine Learning teams.
- Comfortable with team collaboration, presenting technical concepts to non-technical audiences, and researching system design options.