Responsibilities
* Agent scaffolding: tool use, context management, sandboxing, prompt-injection defence
* Evals for fuzzy, high-stakes outputs: assessments, policy interpretation, control mapping
* Reliability infrastructure: retries, fallbacks, circuit breakers, prompt versioning
* Define the internal standard for what "good enough to ship" means for AI features in the organization
Qualifications
* Experience with backend engineering in TypeScript or comparable, with 1–2+ years shipping production LLM features
* Experience with agent frameworks, tool calling, and multi-step orchestration
* Production evals chops: dataset curation, LLM‑as‑judge failure modes, regression testing under model swaps
* Strong systems thinking: async, queues, idempotency
* Comfort being the named owner of AI quality, including saying no when needed
Nice to have
* Anthropic, OpenAI, or open-weight APIs in production at scale
* Prompt‑injection or agent‑security work
* Background in compliance, audit, or any domain where correctness is fuzzy and stakes are high
Benefits
Location. King's Cross, London (Gridiron building). Expect most days in‑office with flexibility as needed.
Compensation. Top decile for the London market, with meaningful EMI‑eligible options.
Perks. Daily team lunch, specialty coffee, roof terrace, on‑site showers for active commuters, and substantial per‑engineer AI tooling and API budgets.
Interview process. Three stages: behavioural phone screen, technical phone screen, and a paid on‑site work trial. Target turnaround is under two weeks from first conversation.
Tech Stack. TypeScript, Node.js, React, Tailwind, OpenAPI, Express, Azure (Container Apps, Service Bus, Front Door, Entra ID), Postgres, Terraform, GitHub Actions, Docker. Anthropic‑first AI with in‑house evals and scaffolding. Claude Code throughout.
#J-18808-Ljbffr