Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Research Engineer, Model Evaluation

Cohere

Who are we?

Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.

We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what’s best for our customers.

Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products.

Join us on our mission and shape the future!

Why this role?

Evaluation is critical to making progress in scaling intelligence. As models continue to become superhuman in many real-world use cases, we must continue to develop new techniques to accurately measure our models' performance on frontier capabilities. In this role, you are responsible for creating next-generation evaluation methods and scalable infrastructure to measure LLM progress.

As a Senior Research Engineer, Model Evaluation, you will:

  • Develop evaluation benchmarks, datasets, and environments for measuring the bleeding edge of model capabilities

  • Conduct research to push the state-of-the-art in LLM evaluation methods, including training LLM judges; improving evaluation efficiency; and scalably building high-quality datasets

  • Build scalable tools for investigating and understanding evaluation results that are used by all members of technical staff at Cohere, as well as leadership and our CEO

  • Learn from and work with the best researchers and engineers in the field

You may be a good fit if:

  • You enjoy pushing the limits of what LLMs are capable of, and you have built high-quality evaluation resources to measure those capabilities (datasets, simulators, environments, etc.)

  • You have a track record of developing new methods and/or data to evaluate LLMs, e.g. publications at top-tier conferences, popular benchmarks, etc.

  • You have deep experience building with and around LLMs, and you have built tools for analyzing and understanding their performance

  • You have strong software engineering skills

If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply! If you want to work really hard on a glorious mission with teammates that want the same thing, Cohere is the place for you.

We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. Should you require any accommodations during the recruitment process, please submit an Accommodations Request Form , and we will work together to meet your needs.

Full-Time Employees at Cohere enjoy these Perks:

An open and inclusive culture and work environment 

‍ Work closely with a team on the cutting edge of AI research 

Weekly lunch stipend, in-office lunches & snacks

Full health and dental benefits, including a separate budget to take care of your mental health 

100% Parental Leave top-up for 6 months for employees based in Canada, the US, and the UK

Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement

Remote-flexible, offices in Toronto, New York, San Francisco and London and co-working stipend

✈️ 6 weeks of vacation

Note: This post is co-authored by both Cohere humans and Cohere technology.

Vacancy posted more than 2 months ago
Similar jobs that could be interesting for youBased on the Senior Research Engineer, Model Evaluation in Toronto, ON vacancy
  • $85 per hour

     ...technical talent with leading AI research labs. Headquartered in San Francisco...  ...Dorsey . Position: iOS Engineer (Coding Agent Experience) Type...  ...coding agents to complete and evaluate complex engineering tasks. Review model-generated mobile application code... 
    Suggested
    Remote job
    Contract work
    Summer work

    Mercor

    Toronto, ON
    5 days ago
  •  ...driving stack is powered by Waabi World, which delivers realistic, scalable, controllable, and efficient simulation. As a Research Engineer in the World Models team, you will develop algorithms and productionize the next generation of World Models that can reason about... 
    Suggested
    Full time

    waabi

    Toronto, ON
    16 hours ago
  • $80 - $120 per hour

     ...creative and technical talent with leading AI research labs. Headquartered in San Francisco, our...  ...customer research and feedback synthesis Evaluator Type: Contract...  ...asynchronously to meet deadlines and improve AI model performance . Qualifications Must... 
    Suggested
    Remote job
    Contract work
    Summer work
    Work at office

    Mercor

    Toronto, ON
    5 days ago
  • $84k - $126k per year

     ...Job Type:  Permanent   Work Model:  Hybrid   Reference code: 133164 Primary Location:  Toronto, ON All Available Locations...  ...various complex financial analyses including independent derivative evaluation, customer behavior modeling, and new innovations such as Machine... 
    Senior
    Permanent employment
    Flexible hours

    Deloitte

    Toronto, ON
    1 day ago
  • $56k - $84k per year

     ...Job Type:  Permanent   Work Model:  Hybrid   Reference code: 133442 Primary Location:  Toronto...  ...look like? As an Analyst, Consultant, or Senior Consultant focusing on the Insurance practice in our Financial Engineering & Modeling team, you will: Conduct... 
    Senior
    Permanent employment
    Flexible hours

    Deloitte

    Toronto, ON
    1 day ago
  •  ...Job Id JREQ186136 Job Type Full time Hybrid Senior Research Engineer Do you love creating innovative solutions for customers?...  ..., etc.) #LI-JF1 What's in it For You? Hybrid Work Model: We've adopted a flexible hybrid working environment (2-3 days... 
    Senior
    Full time
    Work at office
    Remote work
    Flexible hours
    2 days per week
    3 days per week

    Thomson Reuters

    Toronto, ON
    9 days ago
  •  ...expertise spanning machine learning, bioinformatics, data science, engineering, and drug development, our multidisciplinary team in Toronto...  ...how new medicines are created. Ideal Candidate You are a research engineer who bridges the gap between fast-paced, experimental... 
    Senior
    Full time

    deepgenomics

    Toronto, ON
    16 hours ago
  •  ...positive way. To learn more visit: The Evaluation Algorithms team is responsible for...  ...highly realistic closed-loop simulation engine built with the latest in generative AI technologies...  ...to provide a holistic understanding of model performance and enable the discovery of... 
    Full time
    Internship

    waabi

    Toronto, ON
    16 hours ago
  • $140k - $175k per year

     ...Thomson Reuters Labs. We are seeking a Lead Research Engineer who will bring expertise in AI and ML...  ...with research scientists to evaluate, prototype and productionize research concepts...  ...technology Familiarity with probabilistic models and have an understanding of the mathematical... 
    Work at office
    Local area
    Remote work
    Flexible hours
    2 days per week
    3 days per week

    Thomson Reuters

    Toronto, ON
    2 days ago
  • $101k - $169k per year

     ...Job Type:  Permanent   Work Model:  Hybrid   Reference code: 133422 Primary Location...  ...our exponentially expanding Financial Engineering and Modeling group? Are you up for the challenge...  ...,000 (Manager) and $126,000 - $234,000 (Senior Manager), and individuals may be eligible... 
    Senior
    Permanent employment
    Flexible hours

    Deloitte

    Toronto, ON
    1 day ago
  • $155k - $269k per year

     ..., scalable, controllable, and efficient simulation. As a Research Scientist in World Models, you will develop algorithms and productionize the next generation...  ...data of driving scenes. Collaborate with simulation engineers to integrate models into large-scale, distributed... 
    Remote job
    Full time
    Work at office
    Work from home
    Flexible hours

    Waabi

    Toronto, ON
    more than 2 months ago
  • $100k - $145k per year

     ...for technology at Thomson Reuters Labs. We are seeking a Senior Research Engineer who will bring expertise in AI and ML and is interested in...  ...Typescript, etc.)   #LI-SM2 What’s in it For You? Hybrid Work Model: We’ve adopted a flexible hybrid working environment (2-3... 
    Senior
    Full time
    Work at office
    Local area
    Remote work
    Flexible hours
    2 days per week
    3 days per week

    Thomson Reuters

    Toronto, ON
    18 hours ago
  •  ...leader to build a new centralized Evaluation team. This team will be...  ...realistic closed-loop simulation engine built with the latest in...  ...discussions, collaborating with researchers and engineers. - Conduct...  ...evaluating AI or machine learning models, ideally in self-driving or... 
    Full time

    waabi

    Toronto, ON
    16 hours ago
  • $72k per year

     ...plant Responsibilities Tasks Determine product specifications Evaluate chemical process technology and equipment Conduct research into the development or improvement of chemical engineering processes, reactions and materials Design and test chemical... 
    Permanent employment
    Full time
    Remote work

    Zero Energy Water

    Toronto, ON
    19 hours ago
  • $94.6k - $176k per year

     ...fraud or marketing campaign model validation, model development...  ...statistics, computer science or engineering Main Responsibilities...  ...escalating where necessary to senior management. Provide consultancy...  ...management, execution, evaluation and sustainment of initiatives... 
    Senior
    Full time
    Contract work
    Part time
    Shift work
    Toronto, ON
    5 days ago
  • $105k - $155k per year

     ...Sr. User Experience Researcher     Role Summary: Come join...  ...how to use them.    As a Senior UX Researcher, you will define...  ...User Experience, Human Factors, Engineering Psychology, Interactive...  ...in it For You? Hybrid Work Model: We’ve adopted a flexible hybrid... 
    Senior
    Remplacement
    Full time
    Work at office
    Local area
    Remote work
    Flexible hours
    2 days per week
    3 days per week

    Thomson Reuters

    Toronto, ON
    18 hours ago
  •  ...Purpose of Job   The Treasury Modelling team within the Treasury group is responsible for the management of the department’s programmed...  ...supporting Treasury’s programming and quantitative analysis. The incumbent will be instrumental in automating complex departmental models.... 
    Senior
    Full time

    eqbank

    Toronto, ON
    16 hours ago
  •  ...impact the world in a positive way. To learn more visit: As a Research Engineer in Neural Rendering, you will create the next generation of...  ...real-time rendering with NeRF, 3D Gaussian Splatting, diffusion models, etc. - Collaborate with Waabi’s autonomy and safety teams... 
    Full time

    waabi

    Toronto, ON
    16 hours ago
  • Job Responsibility: About Us: Meitou Inc. is a financial startup based in Toronto, Canada. Our mission is to provide global Chinese-speaking investors with professional, trustworthy, and insightful U.S. stock analysis and contents. Our Chinese U.S. stock investment platform...
    Senior
    Full time
    Work at office
    Relocation
    Monday to friday

    Meitou Inc.

    Toronto, ON
    7 days ago
  • $20 per hour

     ...technical talent with leading AI research labs. Headquartered in San...  ...Generate high-quality human evaluation data by identifying response...  ...completeness of responses. Ensure model responses align with expected...  ..., analytics, linguistics, engineering) Preferred Prior... 
    Remote job
    Contract work
    Part time
    Summer work

    Mercor

    Toronto, ON
    22 days ago
  • $94.6k - $176k per year

     ...Provides oversight, monitoring and reporting on model risk for a designated portfolio. Develops...  ...advisor. Makes recommendations to senior leaders on strategy and new initiatives,...  ..., stakeholder management, execution, evaluation and sustainment of initiatives. Leads... 
    Senior
    Full time
    Contract work
    Part time
    Shift work
    Toronto, ON
    9 days ago
  • $118k - $162k per year

     ...s talk. Position Description: As a Senior Research Operations Program Manager at Okta, you...  ...collaboration across research, design, product and engineering teams, you will programmatically manage...  ...or managing beta and/or early release evaluation programs ~ Experience managing... 
    Senior
    Local area
    Worldwide

    Okta

    Toronto, ON
    12 days ago
  •  ...Join the leading chiplet startup! As an Eliyan Verification Engineer , you will be working at a fast-paced early-stage startup creating...  ...Serdes. You will be developing state-of-the-art AMS systemVerilog models (RNM) for best-in-class PHYs. You will own verification of AMS... 
    Full time
    Internship

    eliyan

    Toronto, ON
    16 hours ago
  •  ...bioinformatics, data science, engineering, and drug development, our multidisciplinary...  ...an exceptional and creative Senior/Staff Machine Learning...  ...innovate within our core AI research team, specifically focusing on...  ...of Biological Foundation Models (BioFMs). You will pioneer novel... 
    Senior
    Full time

    deepgenomics

    Toronto, ON
    16 hours ago
  • $80 - $120 per hour

     ...creative and technical talent with leading AI research labs. Headquartered in San Francisco, our...  .... Position: Healthcare operations Evaluator Type: Contract...  ...structured written feedback to enhance AI model outputs . Apply deep subject-matter... 
    Remote job
    Contract work
    Summer work
    Work at office

    Mercor

    Toronto, ON
    1 day ago
  • $80 - $120 per hour

     ...creative and technical talent with leading AI research labs. Headquartered in San Francisco, our...  ...: BI dashboards / performance reporting Evaluator Type: Contract...  ...structured written feedback to improve AI model outputs. Apply deep subject-matter expertise... 
    Remote job
    Contract work
    Summer work
    Work at office

    Mercor

    Toronto, ON
    1 day ago
  •  ...for diverse, innovative and collaborative candidates who want to impact the world in a positive way. To learn more visit: As a Research Engineer in Sensor Signal Processing, you will be a key contributor to the research and development of Waabi’s signal processing stack... 
    Full time

    waabi

    Toronto, ON
    16 hours ago
  •  ...( eqbank.ca ) one of the top banks in Canada on the Forbes World's Best Banks list since 2021.  The Work   The Senior Analyst - Cyber Threat Modeling and Risk supports the Threat Modeling and Risk Assessment program by assisting with the identification, assessment, and... 
    Senior
    Full time

    eqbank

    Toronto, ON
    16 hours ago
  • $80 - $120 per hour

     ...creative and technical talent with leading AI research labs. Headquartered in San Francisco, our...  ...: Biology / environmental science Evaluator Type: Contract Compensation...  ...written feedback to improve AI model outputs. Collaborate with AI research... 
    Remote job
    Contract work
    Summer work
    Work at office

    Mercor

    Toronto, ON
    5 days ago
  • $70 per hour

     ...creative and technical talent with leading AI research labs. Headquartered in San Francisco, our...  ...Remote Role Responsibilities Evaluate AI-generated responses to ensure...  ...structured written feedback to improve AI model outputs . Identify nuances, implicit... 
    Remote job
    Contract work
    Summer work

    Mercor

    Toronto, ON
    13 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Research Engineer, Model Evaluation. Be the first to apply!