Senior Machine Learning Engineer
Job ID
103771
Work area(s)
Employment type
Permanent Full-Time
Description & Requirements
Work Model: Hybrid – minimum of 2 days per week in the Polanco office
Company Overview
Bain & Company is the management consulting firm that the world’s business leaders come to when they want results. Bain advises clients on strategy, operations, information technology, organization, private equity, digital transformation and strategy, and mergers and acquisition, developing practical insights that clients act on and transferring skills that make change stick. The firm aligns its incentives with clients by linking its fees to their results. Bain clients have outperformed the stock market 4 to 1. Founded in 1973, Bain has 57 offices in 36 countries, and its deep expertise and client roster cross every industry and economic sector.
Position Summary
The Staff Engineer, Machine Learning partners with data, engineering, and product teams to architect and operate scalable ML and LLM-powered systems embedded in user-facing analytics workflows.
This role defines the technical standards and architecture for how machine learning systems are built, evaluated, deployed, and monitored, ensuring reliability, performance, and cost efficiency at scale. The Staff Engineer drives cross-team alignment, resolves complex technical challenges, and delivers measurable improvements in system quality, latency, and business impact.
The role requires strong technical leadership, deep expertise in machine learning systems, and the ability to translate ambiguous business problems into scalable engineering solutions.
Essential Functions
ML System Architecture & Pipeline Development
Define and evolve architecture for ML pipelines powering analytics, Q&A, and insight generation across products.
Select and implement ML/LLM patterns based on measurable quality, latency, and cost tradeoffs.
Build resilient systems capable of handling evolving datasets while improving model performance.
Production Deployment & Reliability
Lead production deployment of LLM systems (open-source and API-based).
Define service level objectives (SLOs) and ensure reliability, scalability, and cost efficiency.
Implement safe rollout strategies and optimize inference workloads through capacity planning.
End-to-End System Ownership
Own business-critical ML systems from architecture through sustained production operations.
Define KPIs and operational standards for ML systems.
Lead postmortems and continuous improvement initiatives.
Translate ambiguous goals into clear technical execution strategies.
Data Quality & Evaluation
Establish robust data contracts and monitoring practices across evolving datasets.
Design evaluation frameworks including dashboards, slice analysis, and regression testing.
Prevent quality regressions and enable continuous improvement of ML systems.
MLOps & CI/CD
Define lifecycle management practices for ML systems including model, data, and prompt versioning.
Establish CI/CD standards for ML systems across teams.
Implement observability, governance, and environment promotion practices.
Technical Leadership & Collaboration
Lead cross-functional architecture decisions across product, data, and engineering teams.
Resolve complex cross-system technical issues impacting ML reliability or cost.
Embed ML capabilities into user-facing workflows.
Mentor engineers and elevate engineering standards through architectural reviews and guidance.
Qualifications
Education
Required:
Bachelor’s degree in Computer Science, Engineering, Data Science, or related field, or equivalent practical experience.
Preferred:
Advanced degree in Computer Science, Machine Learning, or a related technical discipline.
Experience
Required:
6+ years of experience in software engineering, machine learning engineering, or related technical roles.
Experience building and operating production-grade ML systems at scale.
Demonstrated experience designing and deploying ML or LLM-based applications in production environments.
Preferred:
Experience owning ML systems spanning multiple teams or product areas.
Experience operating LLM-powered systems in production with measurable business impact.
Experience shaping shared infrastructure or technical standards within an ML domain.
Knowledge, Skills, and Abilities
Technical Skills
Strong proficiency in Python, SQL, and production-grade software development.
Experience architecting scalable ML and LLM systems embedded in production workflows.
Expertise in designing inference pipelines, retrieval systems, and ML data pipelines.
Ability to anticipate scale, reliability, and cost tradeoffs when designing ML systems.
MLOps & DevOps
Experience defining lifecycle management, deployment pipelines, and model versioning strategies.
Experience implementing CI/CD practices for ML systems.
Knowledge of observability and monitoring practices for production ML environments.
Advanced ML Expertise
Deep expertise in NLP and LLM systems, including transformers, retrieval architectures, fine-tuning, and structured extraction.
Experience designing large-scale entity resolution or data integration systems.
Experience building and operating ML infrastructure in cloud environments.
Problem Solving & Communication
Ability to resolve complex technical challenges across systems or teams.
Strong communication skills and ability to translate business objectives into engineering solutions.
Experience driving alignment across product, engineering, and data stakeholders.
Proven mentorship and technical leadership experience.
Language Requirement
Advanced English proficiency (written and spoken) required.