Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Ai agent reliability engineer - chaps

London
Craft Docs Limited, Inc.
Reliability engineer
Posted: 20 July
Offer description

About Craft & Chaps

At Craft, we rethink productivity from first principles. Our products disappear into the background so people can do their life's work-fast, joyfully, and without friction.

Chaps is our new AI-first product, focused on turning a constellation of large-language-model agents into a seamless personal productivity assistant.

About the role

Our AI Product team is looking for an engineer who obsesses over making multi-agent systems robust, observable, and continuously improving. You'll build the test harnesses, evaluation pipelines, and monitoring layers that keep dozens of collaborating agents on-task, on-budget, and on-time.

In practice, that means:
* Designing automated evals that exercise complete agent workflows-catching regressions before they reach users.
* Instrumenting every prompt, tool-call, and model hop with rich telemetry so we can trace root causes in minutes, not days.
* Creating feedback loops that turn logs, user ratings, and synthetic tests into better prompts and safer behaviors.
* Future-proofing agentic systems by allowing quality to evolve with LLM intelligence.
You will partner with product, research, and infra to ship an AI assistant users can trust-no surprises, no downtime.

What we're looking for

You must have:
* Hands-on experience with LLM evaluation frameworks (e.g., OpenAI Evals, LangSmith, LLM-Harness) and a track record of turning eval results into product-ready gating.
* Observability chops-you've wired up tracing/metrics for distributed systems (OpenTelemetry, Prometheus, Grafana) and know how to set SLOs that actually matter.
* Prompt-engineering fluency-few-shot, function-calling, RAG orchestration-and an instinct for spotting ambiguity or jailbreak vectors.
* Production-grade Python/TypeScript skills and comfort shipping through CI/CD (GitHub Actions, Terraform, Docker/K8s).
* A bias for experimentation: you automate A/B tests, cost-latency trade-off studies, and rollback safeguards as part of the dev cycle.
It would be great if you have:
* Experience scaling multi-agent planners or tool-using agents in real products.
* Familiarity with vector databases, semantic diff tooling, or RLHF/RLAIF pipelines.
* A knack for weaving human feedback (support tickets, thumbs-downs) into automated regression tests.
Our Culture
* Think differently. We value novel ideas over legacy playbooks-and we give you room to explore.
* People first. You instrument systems so users never feel the bumps; you collaborate so teammates never feel stuck.
* Pragmatic craftsmanship. We ship fast, but we measure twice-data accuracy, latency budgets, and reliability all matter.
* Clear communicators. You translate metrics into stories that product managers and designers understand, sparking better decisions.
Join us if you want to make AI that works-every request, every time. #J-18808-Ljbffr

Apply
Create E-mail Alert
Job alert activated
Saved
Save
Similar job
Reliability engineer
Southall
Ata Recruitment
Reliability engineer
£55,000 a year
Similar job
Reliability engineer
Southall
Ata Recruitment
Reliability engineer
Similar job
Reliability engineer
Bushey
Pioneer Selection
Reliability engineer
£55,000 a year
See more jobs
Similar jobs
Engineering jobs in London
jobs London
jobs Greater London
jobs England
Home > Jobs > Engineering jobs > Reliability engineer jobs > Reliability engineer jobs in London > AI Agent Reliability Engineer - Chaps

About Jobijoba

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by Job Title
  • Jobs by Industry
  • Jobs by Company
  • Jobs by Location
  • Jobs by Keywords

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2025 Jobijoba - All Rights Reserved

Apply
Create E-mail Alert
Job alert activated
Saved
Save