Job Description
We’re supporting a financial services organisation building an enterprise AI platform and are looking for an AI QA Engineer to help define how large scale automated testing operates across a RAG-enabled ecosystem and help ensure their AI platforms are accurate, secure and production ready.
You’ll be responsible for validating LLM and RAG-based AI systems, building automated testing approaches around AI behaviours and helping define how AI quality, hallucinations, security and observability are measured at scale.
What you’ll be doing:
* Designing automated testing strategies for AI/LLM systems
* Validating RAG pipelines and retrieval accuracy
* Building frameworks to test AI quality, consistency and performance
* Measuring and assessing hallucinations and model behaviour
* Testing AI security boundaries, permissions and access controls
* Supporting observability and reporting through Datadog dashboards
* Working with engineering teams to define AI quality standards
* Testing agentic workflows and integrations including MCP environments
What they’re looking for:
* Strong background in large-scale test automation/QA engineering
* Previous experience testing AI systems, LLMs or GenAI applications
* Experience validating RAG architectures
* Knowledge of hallucination testing/evaluation frameworks
* Exposure to AWS Bedrock and Python
* Understanding of AI security, access controls and governance
* Familiarity with observability tooling (Datadog ideal)
* Bonus: MCP/agent frameworks experience
This is a 6-month contract paying £650 a day (inside IR35/umbrella rate). The client operate a hybrid working model and you will be expected to work 3 days a week in their London-based office.