Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Software Engineer - ML Ops

CARFAX

Description

Join Team CARFAX as a Senior Software Engineer - ML Ops

 

We are looking for a seasoned Senior Software Engineer - ML Opsto join our platform team and take an active role in designing, scaling, and operating the infrastructure that powers Large Language Model (LLM) development and hosting. This is a high-impact, highly technical position where you will own critical platform components, drive architectural decisions, and directly shape the reliability, performance, and security of our AI infrastructure.

 

At its core, this is a Kubernetes-first, cloud-native platform engineering role. We care deeply about your ability to architect and operate scalable, resilient infrastructure for LLM workloads — the specific cloud or tooling you've built that experience on is secondary. Our current platform runs on AWS with EKS, Flyte, ArgoCD, JupyterHub, and the LGTM observability stack, and you'll be working within that environment — but we are far more interested in the depth of your platform thinking than in a specific vendor background.

 

If you are an engineer who thrives at the intersection of AI/ML and cloud-native infrastructure, who gets excited about solving the unique scaling and operational challenges that LLM workloads demand, and who wants to work on technology that sits at the absolute cutting edge of the AI industry — this role was built for you.

 

At CARFAX, we believe in the power of teamwork and value in-person interactions so that we can collaborate and thrive together. This position will require 2 days in the London, ON office per week, subject to change with future business needs. One last thing: Our four-day week continues in Summer 2026! 

 

What You'll Own:

 

LLM Platform Architecture — Actively participate in the design and evolution of the core infrastructure platform supporting LLM training, fine-tuning, and inference workloads at scale, contributing architectural decisions that balance performance, cost, and reliability across the full platform lifecycle.

 

Kubernetes & Advanced Autoscaling — Own the design and implementation of sophisticated K8s autoscaling strategies (HPA, VPA, KEDA, Cluster Autoscaler) tailored to the highly variable and GPU-intensive demands of LLM workloads. Our current environment is EKS, though equivalent production Kubernetes experience on GKE, AKS, or on-prem is equally valued.

 

ML Workflow Orchestration — Participate in the engineering and optimization of ML pipeline infrastructure, contributing to best practices for pipeline design, resource allocation, and workflow reliability across LLM training and evaluation workloads. We currently use Flyte — experience with comparable platforms such as Kubeflow, Airflow, or Prefect translates well.

 

AI Developer Platform — Own and contribute to the architecture and operations of interactive compute environments used by AI researchers and LLM engineers to develop, experiment, and prototype. We run JupyterHub today, though experience with equivalent multi-user ML development platforms is directly applicable.

 

CI/CD & GitOps — Participate in the development and ongoing improvement of GitOps workflows and CI/CD pipelines, contributing to deployment best practices and enabling rapid, reliable delivery of platform changes. Our current implementation uses ArgoCD — strong experience with GitOps principles and comparable tooling is what matters.

 

Observability & Reliability — Contribute to the full observability stack implementation — designing dashboards, defining SLOs, building alerting frameworks, and ensuring deep visibility into LLM workload performance and platform health. We use the LGTM stack (Loki, Grafana, Tempo, Mimir) — experience with Prometheus, OpenTelemetry, ELK, Datadog, or equivalent platforms is welcomed.

 

Cloud Infrastructure — Participate in cloud infrastructure design across compute (including GPU instance families), storage, networking, and IAM, with a strong emphasis on cost optimization and operational excellence. Our primary cloud is AWS — candidates with strong GCP or Azure backgrounds who are prepared to work in AWS are encouraged to apply.

 

Security & Compliance — Engage actively in the vulnerability assessment and remediation program across all platform components, contributing to security standards and ensuring the LLM platform meets organizational and regulatory compliance requirements.

 

Collaborative Engineering — Participate in technical design reviews, contribute to roadmap discussions, and serve as a knowledgeable resource and collaborative partner across AIOps and MLOps disciplines

 

Required Experience & Skills:

 

7+ years of experience in DevOps, Platform Engineering, MLOps, or a closely related infrastructure discipline.

 

Deep Kubernetes expertise — production experience operating Kubernetes at scale on any major managed platform (EKS, GKE, AKS) or on-premises, with advanced knowledge of scheduling, autoscaling, networking, RBAC, and cluster operations.

 

Cloud infrastructure proficiency — extensive experience designing and operating production workloads on at least one major cloud provider (AWS, GCP, or Azure), covering compute, storage, networking, and identity and access management

 

MLOps / AI Infrastructure experience — demonstrated experience building and operating infrastructure that supports ML training, model serving, or LLM workloads, including GPU resource management and scheduling at scale

 

CI/CD & GitOps — strong hands-on experience with GitOps principles and modern CI/CD pipeline design, using any mainstream tooling (ArgoCD, Flux, GitHub Actions, Tekton, or equivalent)

 

Observability Engineering — production experience designing and operating observability platforms including metrics, logging, and distributed tracing, using any modern stack (Grafana/LGTM, Prometheus, Datadog, ELK, or equivalent)

 

Infrastructure as Code — strong proficiency with Terraform, Helm, or comparable IaC and configuration management tooling.

 

Programming & Scripting — solid coding ability in Python and/or Go, with experience writing automation, tooling, and infrastructure integrations.

 

Security Mindset — hands-on experience with vulnerability scanning, remediation workflows, and cloud security best practices including RBAC hardening and secrets management

 

Strongly Preferred:

Direct experience with Flyte or comparable ML workflow orchestration platforms (Kubeflow, Airflow, Prefect, Metaflow)

Experience operating JupyterHub or equivalent multi-user interactive compute platforms at scale

Familiarity with LLM-specific infrastructure — model serving frameworks (vLLM, Triton, TorchServe), GPU cluster management, large-scale distributed training setups

Hands-on experience with AWS (EKS, EC2 GPU families, S3, IAM, VPC) as our current primary cloud environment

Experience with FinOps practices — cloud cost attribution, rightsizing, and spot/preemptible instance strategies for ML workloads

Relevant certifications: CKA / CKS, AWS/GCP/Azure Solutions Architect or DevOps Engineer, or equivalent

Who You Are:

A systems thinker who understands how architectural decisions ripple across reliability, performance, cost, and security — regardless of which cloud or tooling stack those decisions are made within

Operationally minded — you build things to be observable, maintainable, and resilient from day one

Deeply curious about AI and LLMs — you understand why the infrastructure you build matters and stay current with how the AI landscape is evolving

Proactive and ownership-driven — you identify problems before they become incidents and drive solutions to completion

An effective collaborator and communicator who can translate complex infrastructure concepts for AI researchers, data scientists, and engineering leadership alike

Comfortable operating with autonomy in a fast-moving environment where priorities evolve alongside the AI landscape

 

Why This Role Stands Out:

LLM infrastructure is one of the most technically demanding and strategically important engineering domains in the industry today. As a senior member of our AIOps team you will:

Directly shape the platform that enables LLM development and productionization — your contributions will have immediate, measurable impact

Work on genuinely hard infrastructure problems — GPU scheduling, large-scale distributed workloads, high-throughput model serving, and multi-tenant ML environments

Be positioned at the epicenter of the AI infrastructure space, one of the fastest growing and highest-demand engineering disciplines in the industry

Have a clear voice in technical direction — your experience and opinions on platform design are genuinely valued and actively sought

Bring your full experience to the table — whether you've built on AWS, GCP, Azure, or hybrid environments, your platform engineering expertise is what drives impact here

 

What’s in it for you: 

  • Competitive Compensation: Attractive salary, comprehensive benefits, and generous time-off policies. 
  • Flexible Work Schedules: Enjoy 4-day summer work weeks and a winter holiday break. 
  • Retirement Support: 401(k) / DCPP matching. 
  • Performance Rewards: Annual bonus program to recognize your contributions. 
  • Innovative Workspace: Casual, dog-friendly offices designed for creativity and collaboration. 

 

Hear from our Team: Our accolades speak for themselves:

  • 10X Virginia Business Best Places to Work
  • 9X Washingtonian Great Places to Work
  • 9X Washington Post Top Workplace
  • St. Louis Post-Dispatch Best Places to Work

 

About CARFAX and S&P Global Mobility
S&P Global has recently announced the intent to separate our Mobility Segment into a standalone public company.

CARFAX, part of S&P Global Mobility, helps millions of people every day confidently shop, buy, service and sell used cars with innovative solutions powered by CARFAX vehicle history information. The expert in vehicle history since 1984, CARFAX provides exclusive services like CARFAX Used Car Listings, CARFAX Car Care, CARFAX History-Based Value and the flagship CARFAX® Vehicle History Report™ to consumers and the automotive industry. CARFAX owns the world’s largest vehicle history database and is nationally recognized as a top workplace by The Washington Post and Glassdoor.com. Shop, Buy, Service, Sell – Show me the CARFAX™. S&P Global Mobility is a division of S&P Global (NYSE: SPGI). S&P Global is the world’s foremost provider of credit ratings, benchmarks, analytics and workflow solutions in the global capital, commodity and automotive markets.

US Equal Opportunity Employer Statement:CARFAX is an Affirmative Action/Equal Opportunity Employer. It is the policy of CARFAX to provide equal employment opportunity to all persons regardless of race, color, sex, pregnancy, religion, national origin, age, ancestry, citizenship status, veteran status, military status, disability or handicap, sexual orientation, genetic information or any other status protected by federal, state or local law. In addition, CARFAX will provide reasonable accommodations for qualified individuals with disabilities. We maintain a drug-free workplace. We are a participant in E-Verify.

Canadian Equal Opportunity Employer Statement:CARFAX Canada is an equal opportunity employer, and all qualified candidates will receive consideration for employment without regard to race/ethnicity, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, marital status, military veteran status, unemployment status, or any other status protected by law.

We’re committed to providing accommodations by request for candidates taking part in all aspects of the recruitment and selection process. For a confidential inquiry or to request an accommodation, please contact your recruiter or email .

Vacancy posted 10 hours ago
Similar jobs that could be interesting for youBased on the Senior Software Engineer - ML Ops in London, ON vacancy
  • $92.6k - $136k per year

    Description Join Team CARFAX as a Senior Software Engineer    Isn't it time you bragged about where you work? At CARFAX, we do, every day. We pride ourselves on being mission-focused on helping to grow a brand built on accuracy and integrity. We care deeply about our products... 
    Senior
    Summer work
    Casual work
    Work at office
    Local area
    Flexible hours

    CARFAX

    London, ON
    21 hours ago
  • $92.5k - $136k per year

    Description Join Team CARFAX as Senior Software Engineer    Isn't it time you bragged about where you work? At CARFAX, we do, every day. We pride ourselves on being mission-focused on helping to grow a brand built on accuracy and integrity. We care deeply about our products... 
    Senior
    Summer work
    Casual work
    Work at office
    Local area
    2 days per week

    CARFAX

    London, ON
    a month ago
  • $140.75k - $190.75k per year

     ...employees is unsurpassed. Position Overview The Principal Engineer, Intelligent Operations and Observability plays a principal level...  ...model. Incident Response Process Contribution Serves as a senior stakeholder in the ongoing development and improvement of... 
    Senior
    Permanent employment

    McCormick & Company

    London, ON
    8 days ago
  •  ...We are looking for a talented and experienced Engine Programmer to join our team and create high quality engine systems to support the needs...  ...by link or as an attachment to ***email_hidden***, subject "Senior Engine Programmer". NOTE: For the foreseeable future, all positions... 
    Senior
    Long term contract
    Local area
    Remote work
    Work from home
    Flexible hours

    Tactic Studios

    London, ON
    1 day ago
  • $82.9k - $132.9k per year

     ...employee-owned professional consulting firm specializing in planning, engineering, environmental science, and management. We partner with clients...  ...and support design development utilizing specialized aviation software, design tools, and CAD platforms for airside and terminal... 
    Senior
    Long term contract
    Full time
    Contract work
    Flexible hours

    Dillon Consulting

    London, ON
    4 days ago
  • $100k - $130k per year

     ...looking for an experienced Environmental Risk Assessor (Scientist/Engineer) to fill an existing vacancy to join our multidisciplinary team...  ...from various media, surveys of habitat, etc.) Work with other senior specialists in leading the management and interpretation and... 
    Senior
    For subcontractor
    Work at office
    Flexible hours

    Dillon Consulting Limited

    London, ON
    9 days ago
  • $135.2k - $179.1k per year

     ...The Opportunity WSP is currently seeking a Senior Bridge Engineer / Project Manager who has the interest and drive to join our Ontario...  ...projects Working level knowledge of structural analysis software used in bridge design such as CSIBridge, MIDAS, etc. Advanced... 
    Senior
    Long term contract
    Work at office

    WSP in Canada

    London, ON
    9 days ago
  • Sales Trainee - CO-OP Status Horaire, Temps partiel Location London, Ontario We are seeking a motivated and driven individual to join our sales team as a Sales Trainee Coop Student (June to September). As a Sales Trainee, you will work closely with our sales...
    Full time
    Part time
    Traineeship
    Internship
    Work at office
    Flexible hours

    Guillevin

    London, ON
    17 days ago
  •  ...Everyone has a role to play and we take ours seriously. As the Software Developer , you will be collaborating with our Platform and...  ...roles ~ Bachelor's degree or Diploma in Computer Science, Engineering or related field / equivalent experience ~ Extensive... 
    Long term contract
    Full time
    Local area
    Flexible hours

    ZTR

    London, ON
    4 days ago
  • $57.3k - $65k per year

     ...difference? As a Specialist – Electronics Engineering and member of the monitoring team in Bach...  ...complexity, under the direction of more senior team members. What do we want to know...  ...Proficient in the use of electronic design software (Altium Designer and OrCAD) Experience... 
    Senior
    Full time
    Worldwide

    Wabtec

    London, ON
    8 days ago
  • $63k - $83.5k per year

     ...will you make a difference? As a System Engineer and member of the monitoring team in Bach...  ...complexity, under the direction of more senior team members.   What do we want to...  ...working with cross-functional teams (hardware, software, manufacturing, quality) and leading... 
    Senior
    Full time
    Worldwide

    Wabtec

    London, ON
    8 days ago
  •  ...Everyone has a role to play and we take ours seriously. As the DevOps Engineer , you will be working closely with our development, security and IT teams to create fast, reliable and secure software by designing, implementing and maintaining our software infrastructure.... 
    Long term contract
    Full time
    Manual labor
    Local area
    Flexible hours

    ZTR

    London, ON
    4 days ago
  •  ...Brandt is currently seeking a Truck Technical Specification Engineer located in our London or Ayr Peterbilt location. The Technical Specification...  ...simultaneously. Proficient with documentation tools and software (e.g., Microsoft Office, Adobe Creative Suite, technical writing... 
    Full time
    Work at office

    Brandt

    London, ON
    1 day ago
  • $26 - $28 per hour

     ...Description Who will you be working with? As the Mechatronics Engineering (Co-op), you will report to the Manager, Engineering and Products and work with a team of talented engineers and software developers responsible for designing, developing, and delivering our... 
    Hourly pay
    Full time
    Contract work
    Internship
    Worldwide

    Wabtec

    London, ON
    16 days ago
  •  ...Director, Controls Engineering London, Ontario Relocation Assistance will be provided for successful candidates We are seeking talented...  ...on-time delivery.   Represent controls engineering in senior leadership discussions and business reviews. Serve as the controls... 
    Senior
    Long term contract
    Full time
    Internship
    Local area
    Relocation package

    CONVERGIX Automation Solutions

    London, ON
    2 days ago
  • $105.21k - $138.1k per year

     ...2 Summary of Duties: Reporting to the Senior Manager, Capital Programs, the Environmental Services Engineer (RWS) is responsible for contributing to the planning...  ...in Microsoft Office, Office 365 and related software and database applications. Must possess a valid... 
    Senior
    Permanent employment
    Full time
    Contract work
    For contractors
    Work at office
    Monday to friday

    City of London

    London, ON
    12 days ago
  • $120k - $140k per year

     ...the preparation of preliminary budgets, cost plans, and value engineering opportunities Determine general expenses and provide input into...  ...for ELECTRONIC TAKE-OFFS ~ Experience using estimating software an asset (Timberline Estimating, Prime Bid, On-Centre Takeoff preferred... 
    Senior
    Permanent employment
    For subcontractor

    Michael Page

    London, ON
    16 days ago
  • $84.85k - $134.85k per year

     ...Intermediate Stormwater Management Engineer Are you someone with strong leadership and technical capabilities?  Do you enjoy building...  ...development, calibration and analysis in stormwater modeling software such as Visual OTTHYMO, PCSWMM, HEC-RAS and others Support the... 
    Full time
    Contract work
    Flexible hours

    Dillon Consulting

    London, ON
    4 days ago
  •  ...diverse service lines, including Water, Architecture, Building Engineering Services, Field Services, Power, Waste, Industrial, and Transportation...  ...facing our clients and communities. In the role of Senior Project Accountant, we'll count on you to:  Provide detailed... 
    Senior
    Long term contract
    Full time
    Temporary work
    Part time
    Local area

    HDR

    London, ON
    14 days ago
  • $52.5k - $82.72k per year

     ...WE ARE HIRING! JUNIOR MANUFACTURING ENGINEER (On-Site – St. Thomas, ON Office) Build your future with Masco Canada! At Masco Canada, we're passionate about delivering innovative solutions and exceptional customer experiences. We're looking for a Junior Manufacturing Engineer... 
    Long term contract
    Full time
    Internship
    Work at office
    Local area

    Masco Canada Limited

    London, ON
    13 hours ago
  •  ...with an experienced programming team and an modern proprietary game engine aiming to create realistic visuals. A successful candidate will...  ...by link or as an attachment to ***email_hidden***, subject "Senior Graphics Programmer". NOTE: For the foreseeable future, all positions... 
    Senior
    Local area
    Remote work
    Work from home
    Flexible hours

    Tactic Studios

    London, ON
    23 days ago
  • $50k - $80k per year

     ...thriving in the face of change. Make an impact with our Tax team as a Senior Accountant. This diverse team of professionals delivers...  ...Canadian Income Tax Act ~ Experience working with accounting and tax software ****@*****.*** With a focus on high-potential earnings,... 
    Senior
    Remote work
    Flexible hours

    MNP

    London, ON
    24 days ago
  • $80k - $100k per year

     ...ENGINEER – SUSTAINING OPERATIONS About Us! HYDMECH has been a leader in industrial sawing solutions for over four decades, known for engineering innovation, unmatched reliability, and a commitment to supporting metalworking industries across North America. What began as... 
    Long term contract
    Contract work
    Summer work

    HYDMECH

    London, ON
    6 days ago
  • $95k - $130k per year

     ...Dillon Dillon Consulting Limited is a proudly Canadian, 100% employee-owned professional consulting firm specializing in planning, engineering, environmental science, and management. We partner with clients to deliver thoughtful, collaborative, and innovative solutions to... 
    Long term contract
    For contractors
    Work at office
    Flexible hours

    Dillon Consulting Limited

    London, ON
    9 days ago
  • $40 - $45 per hour

     ...Senior Generator Technician – Power Generation Location: London, Ontario Compensation: $40–$45/hour + Overtime + Paid Travel...  ...with strong experience in: Industrial diesel generators Large engine repair and overhauls Generator controls and electrical diagnostics... 
    Senior
    Long term contract
    Night shift
    Weekend work

    JCSI

    London, ON
    2 days ago
  • $22 per hour

     ...Manufacturing Engineering Technician (Maintenance) Full-time Permanent Salaried Position : Monday to Friday (Expectation for Flexibility) Posting Date: 30 th April 2026 Department: Manufacturing Pay Rate: $22/hour, commensurating with experience Position Status... 
    Permanent employment
    Full time
    Monday to friday
    Afternoon shift

    Sterling Marking Products Inc.

    London, ON
    5 days ago
  • $34.73 - $35.4 per hour

     ...considered if the position is not filled from within this bargaining unit. Department Name Facilities Management The Building Engineer is a shared resource between Victoria Hospital and University Hospital. Reporting to the Coordinator, Facilities Engineering, the... 
    Hourly pay
    Full time
    Temporary work
    For contractors
    Apprenticeship
    Local area
    Immediate start
    Shift work

    London Health Sciences Centre

    London, ON
    8 days ago
  •  ...experience, and help you continue on a journey to build a rewarding career at ZTR. What You'll be Doing As the Junior Field Service Engineer – Electrical you will be joining our talented support team and providing quality technical support to our customers by... 
    Long term contract
    Full time
    Casual work
    Flexible hours

    ZTR

    London, ON
    1 day ago
  • $81.5k - $119k per year

    Description Join Team CARFAX as a Senior Data Analyst Isn't it time you bragged about where you work? At CARFAX, we do, every day....  ...technical and non-technical audiences, acting as a bridge between data engineering and business logic.   To be considered for this role, you... 
    Senior
    Summer work
    Casual work
    Work at office
    Local area
    2 days per week

    CARFAX

    London, ON
    a month ago
  •  ...committed to investing in our employees and helping you continue your career at Scotiabank. About the role What your role will be… Senior Financial Advisor At Scotiabank we help employees build their futures – where they can be themselves, and win together. With a... 
    Senior

    Scotiabank

    London, ON
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Software Engineer - ML Ops. Be the first to apply!