We are seeking an AI Evaluation & Model Quality Specialist to support the delivery and validation of AI-driven solutions in collaboration with a global technology partner. The role will focus on defining and executing robust evaluation frameworks to measure model accuracy, reliability, and production readiness across speech-to-text, summarisation, and intent-based AI systems.
Working closely with engineering, product, and partner teams, the successful candidate will design metrics, curate high-quality ground-truth datasets, and conduct rigorous model validation to ensure solutions meet agreed performance and governance standards before deployment.
Key Responsibilities
* Design and implement evaluation frameworks for AI models, including speech-to-text and generative AI outputs.
* Define and apply appropriate performance metrics (e.g., word error rate, semantic accuracy, relevance, completeness) and establish acceptance thresholds.
* Create, validate, and maintain high-quality labelled ground-truth datasets to support transcription, summarisation, and intent evaluation.
* Conduct statistical analysis and systematic error diagnostics to identify root causes and compare model performance.
* Support model validation and governance activities, including regression testing and quality sign-off across SIT, UAT, and production readiness cycles.
* Provide empirical insights to guide prompt optimisation and model tuning, balancing accuracy, latency, and cost considerations.
* Contribute to post-deployment monitoring frameworks, including model performance tracking, drift detection, and continuous improvement processes.
* Translate technical evaluation outcomes into clear, evidence-based insights for business and stakeholder audiences.
Key Skills & Experience
* Strong understanding of AI evaluation methodologies and performance metrics, particularly for speech-to-text and generative AI systems.
* Experience designing and managing labelled datasets for model testing and validation.
* Proficiency in statistical analysis, model benchmarking, and structured error analysis.
* Experience working within model validation, testing, or AI governance frameworks.
* Familiarity with prompt engineering and empirical model optimisation approaches.
* Understanding of monitoring strategies for deployed AI systems, including performance degradation and drift detection.
* Strong communication skills with the ability to present technical findings clearly to non-technical stakeholders.
Working Environment
The role will operate within a cross-functional delivery team and collaborate closely with a global technology partner to ensure AI solutions are rigorously evaluated, governed, and ready for enterprise deployment.