Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Ai evaluations engineer

Manchester
ConnexAI
Engineer
Posted: 9h ago
Offer description

Role summary

This role sits at the centre of how we measure and improve AI systems in production.

You’ll define what good performance means across LLMs, ASR, TTS, and full speech-to-speech pipelines, and build the datasets, metrics, and evaluation systems that make AI quality measurable and comparable in the real world.

You’ll work closely with engineering and product teams to ensure model changes lead to real improvements in user experience, not just better offline benchmarks.


What you’ll do

* Design and run evaluations across LLM, ASR, TTS, and speech-to-speech systems
* Build real-world datasets and test cases from production behaviour and edge cases
* Define metrics and scorecards for model and system quality
* Benchmark internal models against external and frontier systems
* Evaluate full pipelines (ASR → LLM → TTS), not just individual models
* Build Python tools to automate evaluation workflows
* Create internal leaderboards, red-teaming setups, and regression tests
* Work with engineers and product teams to diagnose system failures
* Turn vague product goals into measurable evaluation frameworks


What this role is about

* Defining and measuring AI quality in production systems
* Turning real user behaviour into structured evaluation signals
* Ensuring model changes improve real-world performance
* Understanding why AI systems fail, not just whether they do


What good looks like

* You can translate improved quality into measurable metrics
* You think in terms of system impact (before vs after), not just accuracy
* You’re comfortable working across code, data, and production systems
* You care about real-world behaviour, not just benchmarks


Core skills

* Strong Python (scripting, data analysis, tooling)
* Experience with ML systems, evaluation, or experimentation
* Understanding of LLMs or speech systems (ASR / TTS)
* Ability to design test cases and structured datasets
* Comfortable working with engineers and product teams


Nice to have

* Experience with LLM evaluation or benchmarking
* Exposure to speech or multimodal systems
* Familiarity with production APIs or ML systems
* Experience with automated testing or CI-style workflows

Apply
Create E-mail Alert
Job alert activated
Saved
Save
Similar job
Pdi engineer (plant and agricultural)
Rochdale
Ernest Gordon Recruitment
Engineer
£35,000 a year
Similar job
Senior safety case engineer
Warrington
Morson Edge
Engineer
Similar job
Engineer
Rochdale
Gleeson Homes
Engineer
See more jobs
Similar jobs
Engineering jobs in Manchester
jobs Manchester
jobs Greater Manchester
jobs England
Home > Jobs > Engineering jobs > Engineer jobs > Engineer jobs in Manchester > AI Evaluations Engineer

About Jobijoba

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by Job Title
  • Jobs by Industry
  • Jobs by Company
  • Jobs by Location
  • Jobs by Keywords

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2026 Jobijoba - All Rights Reserved

Apply
Create E-mail Alert
Job alert activated
Saved
Save