Role: Senior AI Engineer (LLMOps & RAG)
Location: Remote (UK-based preferred)
Type: Full-time
Compensation: Competitive
About Veridox:
Veridox is an AI-driven fraud detection platform purpose-built for insurers. We combine document analysis with contextual intelligence to output detailed risk analysis. We have a high focus on trust, accuracy and explainability.As part of our growing team, you’ll play a key role in scaling the technical vision that powers our platform.
The Role:
We’re looking for a hands-on, delivery-first engineer to lead the development and optimisation of our LLM and RAG pipelines. This isn’t a research role. You’ll be responsible for building, benchmarking, and deploying high-performance, cost-efficient AI features that work, and improve, in production.
We’re not looking for 100-page white papers. We’re looking for someone who can ship features, track performance, and find novel solutions to customers problems.
What You’ll Do:
• Build and optimise RAG pipelines using AWS Bedrock, OpenSearch, and vector stores
• Own our “Golden Dataset”, curating the truth-set we use to evaluate model output
• Automate evaluation using tools like RAGAS, DeepEval, or custom “LLM-as-a-judge” logic
• Track drift, hallucination, and cost using observability tooling (Arize, Phoenix, etc.)
• Design self-improving systems where user interaction data flows back into future retrieval/ranking
• Balance cost and performance by selecting the right model for the right task (Claude, SLMs, or whatever gets the job done)
• Write clean and fast Python and ship infrastructure as code
Tech Stack:
If your experience is a mix-and-match of a selection of the below platforms and technologies, we'd like to hear from you.
• Languages: Python, TypeScript, HCL
• Vector & Search: OpenSearch, AWS S3 Vectors
• Observability & Evaluation: Arize, Phoenix, RAGAS, DeepEval
• Infrastructure: AWS Step Functions, Azure Function Apps
• DevOps: CI/CD pipelines (BitBucket)
What We’re Looking For:
• Proven experience building LLM/RAG pipelines in production
• Confidence in statistical evaluation (sample sizes, regression testing)
• Ability to define evaluation metrics and continuously improve model outputs
• Strong understanding of unit economics in LLM systems (token cost, latency, accuracy trade-offs)
• Clear communicator who can flag blockers early and ship fast
Nice-to-Have:
• Experience with AWS S3 Vector store or similar
• Familiarity with AI-driven fraud detection, legal tech, or investigative tools
• Prior work with small language models (7B–8B) for cost-effective inference
Why Join Us?
You will work on a system where evaluation is central to the product. You’ll have the autonomy to define standards for building, measuring, and improving complex AI systems.
If you care about rigour, impact, and building things that matter: we’d love to hear from you.