AI Quality & Evaluation Manager (Contract)
Location: Hybrid working - Blackfriars 3 days per week
Contract: 6 months, Outside IR35
Are you passionate about building the future of AI quality? Do you thrive in hands-on roles where you can shape frameworks from the ground up and make a real impact? We’re looking for an experienced AI Quality & Evaluation Manager to join our team on a contract basis and lay the foundations for robust, reliable, and user-focused AI services across our business.
What You’ll Do
* Design and implement a comprehensive AI testing and evaluation framework for all AI solutions, including LLM-based tools, RAG systems, and third-party platforms.
* Define and document quality standards for semantic accuracy, factual consistency, bias, tone, and relevance.
* Develop reusable testing templates, data sets, and evaluation methods that can be scaled and maintained by internal teams.
* Run hands-on testing of AI prototypes and production tools to assess technical performance and business value.
* Collaborate with business users to guide practical testing and feedback processes.
* Deliver training and upskilling materials to empower internal staff to sustain the framework after your contract ends.
* Support vendor evaluations and POC assessments with robust test protocols.
* Establish baseline metrics and dashboards to measure ongoing AI quality and relevance.
* Work closely with engineering and product leads to embed testing into delivery workflows.
* Champion responsible AI practices to ensure fairness, transparency, and user trust.
What You’ll Bring
* Strong hands-on experience in testing and evaluation of AI or software systems, ideally with NLP or LLM-based applications.
* Understanding of prompt evaluation, semantic search, and LLM behaviour (accuracy, hallucination, bias, tone, etc.).
* Familiarity with tools like Trulens, HumanLoop, PromptLayer, or similar; experience designing QA approaches for GenAI environments.
* Knowledge of modern AI architectures (RAG pipelines, embeddings, API integrations such as OpenAI, Azure OpenAI, Anthropic).
* Experience designing and implementing structured test regimes in fast-evolving contexts.
* Excellent communication and facilitation skills, engaging both technical and business audiences.
* Proven ability to create sustainable frameworks, documentation, and training materials.
Who You Are
* A builder who loves creating practical, scalable solutions.
* Hands-on and analytical, balancing experimentation with process.
* Collaborative and empathetic, bridging technical and non-technical teams.
* User-focused, driven by delivering real value.
* Committed to responsible AI, fairness, and transparency.
Ready to shape the future of AI quality with us?
Apply now and help us ensure our AI-enabled services are accurate, consistent, and trusted by all.