About
This is a well-funded frontier AI scale-up building agentic systems that automate complex, multi-step work.
The product and research stack are getting stronger, and safety is increasingly about how those systems behave in realistic environments rather than in isolated model evaluations.
Model safety matters. But once systems can browse, call tools, and complete multi-step tasks, a different class of risk appears. This role is about understanding and reducing that risk at the agent layer.
The brief is still forming. That is part of the appeal. They want someone who can help define what good agent safety looks like in practice.
What you’ll do
* Design and run safety evaluations for agent behaviour across realistic tasks and environments
* Identify failure modes in tool use, planning, browsing, and multi-step execution
* Build mitigations, guardrails, and intervention strategies around risky agent behaviour
* Work with research and engineering teams to improve safe behaviour without killing usefulness
* Turn concrete incidents or near-misses into better tests, policies, and system changes
* Help define internal frameworks for agent safety, oversight, and operational risk
* Contribute hands-on to the systems used to evaluate and monitor safety over time
What you’ll need
* Strong engineering or research background in AI safety, alignment, evaluations, or LLM systems
* Good Python skills and comfort building practical tools rather than only writing papers
* Ability to reason clearly about risk in real product behaviour, not just offline benchmarks
* Experience designing evaluations, red-teaming systems, or analysing model/agent failures
* Strong judgment, clear communication, and comfort with an evolving brief
* Interest in the safety problems that appear once agents start acting in the world
Optional Bonus
* Experience with policy systems, trust and safety, or security-style threat modelling
* Familiarity with tool-using agents or computer-use systems
Shortlisted candidates will be contacted within 48 hours.