Test & ai evalutation lead

Didcot

Permanent

Posted: 11h ago

Offer description

My client is hiring a Test & AI Evaluation Lead to own how they validate its AI-driven, mission-critical systems - from multi-agent orchestration and LLM outputs through to cloud infrastructure and real-time user-facing applications. Why This Role? You enjoy working close to the technology, influencing how systems are built - not just tested - and tackling the realities of validating AI-driven software, this role gives you genuine ownership and impact. You'll design and lead test approaches where correctness, resilience, and security matter as much as feature velocity. Working embedded with AI, Backend, Frontend, and DevOps, you'll shape how they validate agent behaviours, data pipelines, and end-to-end operational workflows - from research prototypes through to production deployments. Quality is built in from day one, not inspected at the end. Key Responsibilities Test Strategy & Leadership Define and own the end-to-end test strategy across AI, backend, frontend, and infrastructure layers. Establish testing standards appropriate for agentic AI systems, including non-deterministic behaviour and probabilistic outputs. Ensure testing aligns with mission-critical, safety-conscious, and security-first delivery expectations. Act as the primary quality authority across projects, advising engineering and productleadership on risk and readiness. AI & Data-Focused Testing Design approaches for testing multi-agent workflows, including orchestration logic, memory/state handling, and tool integrations. Define validation strategies for LLM outputs, including groundedness, hallucination detection, task success rates, and regression testing. Work with AI Engineers to embed evaluation metrics and pass/fail thresholds into pipelines. Validate data ingestion, transformation, and inference pipelines across structured and unstructured data sources. Automation & Tooling Drive a test-automation-first mindset, integrating tests into CI/CD pipelines (GitHub Actions, Argo CD) Oversee automated testing across API and service layers, UI (E2E and accessibility), and infrastructure and deployment workflows. Select, implement, and evolve testing tools and frameworks appropriate to modern cloud-native and AI systems. Non-Functional Testing Own performance, scalability, reliability, and resilience testing for distributed systems. Coordinate security testing activities in line with secure-by-design principles (e.g. IAM, secrets handling, data boundaries). Validate backup, disaster recovery, and failover scenarios alongside DevOps and Backend teams. Delivery & Collaboration Embed with delivery teams to ensure testing is planned early and executed continuously. Work closely with Product and Engineering to define clear acceptance criteria and definition of done. Provide clear, decision-ready quality reporting to technical and non-technical stakeholders. Support customer-facing demonstrations, trials, and operational readiness assessments. Required Skills & Experience Proven experience as a Test Manager, Senior Test Lead, or equivalent on complex software systems. Strong track record of taking applications into production in regulated environments. Strong background in automated testing across APIs, services, and UIs, integrated into CI/CD pipelines. Experience testing distributed, cloud-native systems (AWS, GCP, or Kubernetes), including performance, reliability, and resilience. Awareness of compliance frameworks (e.g. ISO 27001, NIST, OWASP ISTQB Advanced / Test Manager certification or equivalent practical experience SC Clearance or eligibility to obtain UK SC Clearance. Preferred Experience Experience in UK defence, public sector, or security environments. Experience testing AI/ML/LLM-based systems, including non-deterministic outputs. Exposure to agent-based or workflow-driven architectures. Soft Skills A pragmatic, delivery-focused mindset - able to balance speed with rigour. Comfortable operating in fast-moving, ambiguous, R&D-heavy environments. Confidence challenging assumptions and raising quality risks early. Strong written and verbal communication, especially around complex technical risk. Rewards Salary: negotiable based on experience and attributes. Rapid career progression with meaningful ownership of quality across all products. Ability to shape the direction of a fast-moving, successful early-stage business. Highly flexible working hours and hybrid working.

Apply

Create E-mail Alert

Save

See more jobs