Machine Learning Operations Engineer II

Headquartered in Cambridge, Massachusetts, CarGurus is the all-in-one platform that’s moving the entire car shopping journey online and guiding customers through each step. This includes everything from selling an old car to financing, purchasing, and delivering a new one. Today, millions of consumers visit cargurus.com each month, and more than 30,000 dealerships use our products. We have a people-first culture that fosters kindness, collaboration, and innovation, while empowering our Gurus with tools and resources to fuel their career growth. Our goal is to give all people—consumers, dealers, and our employees—the power to reach their destination.
Job Description
Role overview
As a core member of the Machine Learning Platform team, the Machine Learning Operations Engineer will be responsible for enhancing and maintaining CarGurus’ cloud-hosted ML platform. You will partner closely with data scientists to deploy machine learning models to production and to build and maintain the APIs and data pipelines that integrate predictive intelligence into CarGurus’ products. You will have the opportunity to contribute to systems supplying Recommendations, Search Ranking, Computer Vision, Instant Market Value, and more.
What you’ll do
- Write production-quality training jobs and inference APIs for our Python ML models, deploying them to robust scalable services
- Contribute enhancements to the CarGurus ML platform, leveraging technologies such as AWS SageMaker, GitHub Actions, and Docker
- Participate in systems design conversations with our data scientists and engineering partners, using your engineering expertise and experience to help them design scalable and robust systems
- Develop in-house tools and libraries to standardize and accelerate the ML development process
- Own and maintain aspects of the Data Science team’s engineering infrastructure
- Promote and foster an inclusive, transparent, and collaborative culture
What you’ll bring
- 2-3 years experience writing and debugging Python code
- Familiarity with software engineering tools and standard methodologies, e.g. git, unit testing, object-oriented design, containerization
- A working understanding of the machine learning lifecycle, including model training, evaluation, deployment, and monitoring
- Familiarity with the Python ML ecosystem (e.g. scikit-learn, XGBoost, PyTorch, numpy, pandas)
- Experience deploying, monitoring, and troubleshooting ML models in a public cloud (we use AWS)
- Knowledge of SQL and familiarity with cloud data warehouses (we use Snowflake)
Role overview
As a core member of the Machine Learning Platform team, the Machine Learning Operations Engineer will be responsible for enhancing and maintaining CarGurus’ cloud-hosted ML platform. You will partner closely with data scientists to deploy machine learning models to production and to build and maintain the APIs and data pipelines that integrate predictive intelligence into CarGurus’ products. You will have the opportunity to contribute to systems supplying Recommendations, Search Ranking, Computer Vision, Instant Market Value, and more.
What you’ll do
- Write production-quality training jobs and inference APIs for our Python ML models, deploying them to robust scalable services
- Contribute enhancements to the CarGurus ML platform, leveraging technologies such as AWS SageMaker, GitHub Actions, and Docker
- Participate in systems design conversations with our data scientists and engineering partners, using your engineering expertise and experience to help them design scalable and robust systems
- Develop in-house tools and libraries to standardize and accelerate the ML development process
- Own and maintain aspects of the Data Science team’s engineering infrastructure
- Promote and foster an inclusive, transparent, and collaborative culture
What you’ll bring
- 2-3 years experience writing and debugging Python code
- Familiarity with software engineering tools and standard methodologies, e.g. git, unit testing, object-oriented design, containerization
- A working understanding of the machine learning lifecycle, including model training, evaluation, deployment, and monitoring
- Familiarity with the Python ML ecosystem (e.g. scikit-learn, XGBoost, PyTorch, numpy, pandas)
- Experience deploying, monitoring, and troubleshooting ML models in a public cloud (we use AWS)
- Knowledge of SQL and familiarity with cloud data warehouses (we use Snowflake)