Senior Research Engineer, Model Evaluation

Cohere

Who are we?

Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.

We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what’s best for our customers.

Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products.

Join us on our mission and shape the future!

Why this role?

Evaluation is critical to making progress in scaling intelligence. As models continue to become superhuman in many real-world use cases, we must continue to develop new techniques to accurately measure our models' performance on frontier capabilities. In this role, you are responsible for creating next-generation evaluation methods and scalable infrastructure to measure LLM progress.

As a Senior Research Engineer, Model Evaluation, you will:

Develop evaluation benchmarks, datasets, and environments for measuring the bleeding edge of model capabilities
Conduct research to push the state-of-the-art in LLM evaluation methods, including training LLM judges; improving evaluation efficiency; and scalably building high-quality datasets
Build scalable tools for investigating and understanding evaluation results that are used by all members of technical staff at Cohere, as well as leadership and our CEO
Learn from and work with the best researchers and engineers in the field

You may be a good fit if:

You enjoy pushing the limits of what LLMs are capable of, and you have built high-quality evaluation resources to measure those capabilities (datasets, simulators, environments, etc.)
You have a track record of developing new methods and/or data to evaluate LLMs, e.g. publications at top-tier conferences, popular benchmarks, etc.
You have deep experience building with and around LLMs, and you have built tools for analyzing and understanding their performance
You have strong software engineering skills

If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply! If you want to work really hard on a glorious mission with teammates that want the same thing, Cohere is the place for you.

We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. Should you require any accommodations during the recruitment process, please submit an Accommodations Request Form , and we will work together to meet your needs.

Full-Time Employees at Cohere enjoy these Perks:

An open and inclusive culture and work environment

‍ Work closely with a team on the cutting edge of AI research

Weekly lunch stipend, in-office lunches & snacks

Full health and dental benefits, including a separate budget to take care of your mental health

100% Parental Leave top-up for 6 months for employees based in Canada, the US, and the UK

Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement

Remote-flexible, offices in Toronto, New York, San Francisco and London and co-working stipend

✈️ 6 weeks of vacation

Note: This post is co-authored by both Cohere humans and Cohere technology.

Apply

Vacancy posted more than 2 months ago

Similar jobs that could be interesting for youBased on the Senior Research Engineer, Model Evaluation in Toronto, ON vacancy

iOS Engineer - AI Model Evaluator
$85 per hour
...technical talent with leading AI research labs. Headquartered in San Francisco... ...Dorsey . Position: iOS Engineer (Coding Agent Experience) Type... ...coding agents to complete and evaluate complex engineering tasks. Review model-generated mobile application code...
Suggested
Remote job
Contract work
Summer work
Mercor
Toronto, ON
5 days ago
Research Engineer, World Models
...driving stack is powered by Waabi World, which delivers realistic, scalable, controllable, and efficient simulation. As a Research Engineer in the World Models team, you will develop algorithms and productionize the next generation of World Models that can reason about...
Suggested
Full time
waabi
Toronto, ON
16 hours ago
User Research Expert - Evaluator
$80 - $120 per hour
...creative and technical talent with leading AI research labs. Headquartered in San Francisco, our... ...customer research and feedback synthesis Evaluator Type: Contract... ...asynchronously to meet deadlines and improve AI model performance . Qualifications Must...
Suggested
Remote job
Contract work
Summer work
Work at office
Mercor
Toronto, ON
5 days ago
Senior Consultant, Credit Risk Models - Financial Engineering & Modeling
$84k - $126k per year
...Job Type: Permanent Work Model: Hybrid Reference code: 133164 Primary Location: Toronto, ON All Available Locations... ...various complex financial analyses including independent derivative evaluation, customer behavior modeling, and new innovations such as Machine...
Senior
Permanent employment
Flexible hours
Deloitte
Toronto, ON
1 day ago
Actuarial Analyst/Consultant/Senior Consultant - Financial Engineering & Modeling
$56k - $84k per year
...Job Type: Permanent Work Model: Hybrid Reference code: 133442 Primary Location: Toronto... ...look like? As an Analyst, Consultant, or Senior Consultant focusing on the Insurance practice in our Financial Engineering & Modeling team, you will: Conduct...
Senior
Permanent employment
Flexible hours
Deloitte
Toronto, ON
1 day ago
(Python Based) Senior Research Engineer
...Job Id JREQ186136 Job Type Full time Hybrid Senior Research Engineer Do you love creating innovative solutions for customers?... ..., etc.) #LI-JF1 What's in it For You? Hybrid Work Model: We've adopted a flexible hybrid working environment (2-3 days...
Senior
Full time
Work at office
Remote work
Flexible hours
2 days per week
3 days per week
Thomson Reuters
Toronto, ON
9 days ago
Senior Research Engineer, Machine Learning
...expertise spanning machine learning, bioinformatics, data science, engineering, and drug development, our multidisciplinary team in Toronto... ...how new medicines are created. Ideal Candidate You are a research engineer who bridges the gap between fast-paced, experimental...
Senior
Full time
deepgenomics
Toronto, ON
16 hours ago
Software Engineer, Evaluation Infrastructure
...positive way. To learn more visit: The Evaluation Algorithms team is responsible for... ...highly realistic closed-loop simulation engine built with the latest in generative AI technologies... ...to provide a holistic understanding of model performance and enable the discovery of...
Full time
Internship
waabi
Toronto, ON
16 hours ago
Lead Research Engineer
$140k - $175k per year
...Thomson Reuters Labs. We are seeking a Lead Research Engineer who will bring expertise in AI and ML... ...with research scientists to evaluate, prototype and productionize research concepts... ...technology Familiarity with probabilistic models and have an understanding of the mathematical...
Work at office
Local area
Remote work
Flexible hours
2 days per week
3 days per week
Thomson Reuters
Toronto, ON
2 days ago
Manager/Senior Manager, Quantitative Market Risk Models - Financial Engineering and Modeling
$101k - $169k per year
...Job Type: Permanent Work Model: Hybrid Reference code: 133422 Primary Location... ...our exponentially expanding Financial Engineering and Modeling group? Are you up for the challenge... ...,000 (Manager) and $126,000 - $234,000 (Senior Manager), and individuals may be eligible...
Senior
Permanent employment
Flexible hours
Deloitte
Toronto, ON
1 day ago
Research Scientist, World Models
$155k - $269k per year
..., scalable, controllable, and efficient simulation. As a Research Scientist in World Models, you will develop algorithms and productionize the next generation... ...data of driving scenes. Collaborate with simulation engineers to integrate models into large-scale, distributed...
Remote job
Full time
Work at office
Work from home
Flexible hours
Waabi
Toronto, ON
more than 2 months ago
Senior Research Engineer
$100k - $145k per year
...for technology at Thomson Reuters Labs. We are seeking a Senior Research Engineer who will bring expertise in AI and ML and is interested in... ...Typescript, etc.) #LI-SM2 What’s in it For You? Hybrid Work Model: We’ve adopted a flexible hybrid working environment (2-3...
Senior
Full time
Work at office
Local area
Remote work
Flexible hours
2 days per week
3 days per week
Thomson Reuters
Toronto, ON
18 hours ago
Evaluation Lead
...leader to build a new centralized Evaluation team. This team will be... ...realistic closed-loop simulation engine built with the latest in... ...discussions, collaborating with researchers and engineers. - Conduct... ...evaluating AI or machine learning models, ideally in self-driving or...
Full time
waabi
Toronto, ON
16 hours ago
research chemical engineer
$72k per year
...plant Responsibilities Tasks Determine product specifications Evaluate chemical process technology and equipment Conduct research into the development or improvement of chemical engineering processes, reactions and materials Design and test chemical...
Permanent employment
Full time
Remote work
Zero Energy Water
Toronto, ON
19 hours ago
Senior Manager Model Validation-AML and Fraud Models
$94.6k - $176k per year
...fraud or marketing campaign model validation, model development... ...statistics, computer science or engineering Main Responsibilities... ...escalating where necessary to senior management. Provide consultancy... ...management, execution, evaluation and sustainment of initiatives...
Senior
Full time
Contract work
Part time
Shift work
Toronto, ON
5 days ago
Senior UX Researcher
$105k - $155k per year
...Sr. User Experience Researcher Role Summary: Come join... ...how to use them. As a Senior UX Researcher, you will define... ...User Experience, Human Factors, Engineering Psychology, Interactive... ...in it For You? Hybrid Work Model: We’ve adopted a flexible hybrid...
Senior
Remplacement
Full time
Work at office
Local area
Remote work
Flexible hours
2 days per week
3 days per week
Thomson Reuters
Toronto, ON
18 hours ago
Senior Analyst, Treasury Modelling
...Purpose of Job The Treasury Modelling team within the Treasury group is responsible for the management of the department’s programmed... ...supporting Treasury’s programming and quantitative analysis. The incumbent will be instrumental in automating complex departmental models....
Senior
Full time
eqbank
Toronto, ON
16 hours ago
Research Engineer, Neural Rendering
...impact the world in a positive way. To learn more visit: As a Research Engineer in Neural Rendering, you will create the next generation of... ...real-time rendering with NeRF, 3D Gaussian Splatting, diffusion models, etc. - Collaborate with Waabi’s autonomy and safety teams...
Full time
waabi
Toronto, ON
16 hours ago
Senior Equity Research
Job Responsibility: About Us: Meitou Inc. is a financial startup based in Toronto, Canada. Our mission is to provide global Chinese-speaking investors with professional, trustworthy, and insightful U.S. stock analysis and contents. Our Chinese U.S. stock investment platform...
Senior
Full time
Work at office
Relocation
Monday to friday
Meitou Inc.
Toronto, ON
7 days ago
Language Model Evaluator - Fully Remote | Upto $20/hr Part-time
$20 per hour
...technical talent with leading AI research labs. Headquartered in San... ...Generate high-quality human evaluation data by identifying response... ...completeness of responses. Ensure model responses align with expected... ..., analytics, linguistics, engineering) Preferred Prior...
Remote job
Contract work
Part time
Summer work
Mercor
Toronto, ON
22 days ago
Senior Manager, Credit Risk Model Validation
$94.6k - $176k per year
...Provides oversight, monitoring and reporting on model risk for a designated portfolio. Develops... ...advisor. Makes recommendations to senior leaders on strategy and new initiatives,... ..., stakeholder management, execution, evaluation and sustainment of initiatives. Leads...
Senior
Full time
Contract work
Part time
Shift work
Toronto, ON
9 days ago
Senior Research Operations Program Manager
$118k - $162k per year
...s talk. Position Description: As a Senior Research Operations Program Manager at Okta, you... ...collaboration across research, design, product and engineering teams, you will programmatically manage... ...or managing beta and/or early release evaluation programs ~ Experience managing...
Senior
Local area
Worldwide
Okta
Toronto, ON
12 days ago
DV - Staff (+) Verification Engineer -  AMS modeling
...Join the leading chiplet startup! As an Eliyan Verification Engineer , you will be working at a fast-paced early-stage startup creating... ...Serdes. You will be developing state-of-the-art AMS systemVerilog models (RNM) for best-in-class PHYs. You will own verification of AMS...
Full time
Internship
eliyan
Toronto, ON
16 hours ago
Senior Research Scientist, Machine Learning (BioFM)
...bioinformatics, data science, engineering, and drug development, our multidisciplinary... ...an exceptional and creative Senior/Staff Machine Learning... ...innovate within our core AI research team, specifically focusing on... ...of Biological Foundation Models (BioFMs). You will pioneer novel...
Senior
Full time
deepgenomics
Toronto, ON
16 hours ago
Healthcare Operations Evaluator - Expert
$80 - $120 per hour
...creative and technical talent with leading AI research labs. Headquartered in San Francisco, our... .... Position: Healthcare operations Evaluator Type: Contract... ...structured written feedback to enhance AI model outputs . Apply deep subject-matter...
Remote job
Contract work
Summer work
Work at office
Mercor
Toronto, ON
1 day ago
BI Dashboard Expert - Evaluator
$80 - $120 per hour
...creative and technical talent with leading AI research labs. Headquartered in San Francisco, our... ...: BI dashboards / performance reporting Evaluator Type: Contract... ...structured written feedback to improve AI model outputs. Apply deep subject-matter expertise...
Remote job
Contract work
Summer work
Work at office
Mercor
Toronto, ON
1 day ago
Research Engineer, Sensor Signal Processing
...for diverse, innovative and collaborative candidates who want to impact the world in a positive way. To learn more visit: As a Research Engineer in Sensor Signal Processing, you will be a key contributor to the research and development of Waabi’s signal processing stack...
Full time
waabi
Toronto, ON
16 hours ago
Senior Analyst - Cyber Threat Modeling and Risk
...( eqbank.ca ) one of the top banks in Canada on the Forbes World's Best Banks list since 2021. The Work The Senior Analyst - Cyber Threat Modeling and Risk supports the Threat Modeling and Risk Assessment program by assisting with the identification, assessment, and...
Senior
Full time
eqbank
Toronto, ON
16 hours ago
Biology Evaluator - Domain Expert
$80 - $120 per hour
...creative and technical talent with leading AI research labs. Headquartered in San Francisco, our... ...: Biology / environmental science Evaluator Type: Contract Compensation... ...written feedback to improve AI model outputs. Collaborate with AI research...
Remote job
Contract work
Summer work
Work at office
Mercor
Toronto, ON
5 days ago
Analytical Evaluator - AI Feedback
$70 per hour
...creative and technical talent with leading AI research labs. Headquartered in San Francisco, our... ...Remote Role Responsibilities Evaluate AI-generated responses to ensure... ...structured written feedback to improve AI model outputs . Identify nuances, implicit...
Remote job
Contract work
Summer work
Mercor
Toronto, ON
13 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Research Engineer, Model Evaluation. Be the first to apply!