Requirements
Must have:
- Strong experience with training deep learning models in production - In-depth knowledge of PyTorch with hands-on experience in torch.distributed (DDP/FSDP-style training) - Experience of training large sequence models or LLMs at scale - Software engineering background with Python; familiarity with TypeScript and/or Golang - Distributed systems/training ops experience with practical knowledge of multi-node jobs on GPU clusters (Slurm, Kubernetes, or managed cloud equivalents) - Familiarity with GPU performance tuning (memory usage, mixed precision, throughput vs. latency trade-offs) - Experience within a reinforcement learning environment - Collaborative with great communication skills - Degree educated to BSc/MSc in a relevant discipline
Responsibilities:
- Take open-source LLMs and convert them into high-performance software engineer agents using supervised fine-tuning and large-scale reinforcement learning - Design and run extensive training experiments across multi-node GPU clusters - Build RL loops where models write code and receive feedback based on real test outcomes - Push long-context and MoE style architectures to their limits - Work hands-on across the full stack including custom PyTorch dataloaders, distributed training, and debugging NCCL issues - Design opinionated reward functions that reflect exceptional engineering practices - Extend benchmark suites and test models on real-world repositories - Analyze failure modes and provide insights to improve data and training strategies - Collaborate with infrastructure, product, and research teams to inform training decisions and result measurements
Company:
We are a London-based tech start-up with £5 million in recent pre-seed funding, focused on creating an impactful AI agentic platform that writes production-grade code. We offer a dog-friendly office environment with daily catered lunches, 30 days of holiday (including bank holidays), salary up to £110k, equity options, pension, and monthly socials. Our working hours are from, with no expectation to work beyond these hours. We are looking for a Machine Learning Engineer who can shape and influence our product.