🚀 GPU Infrastructure / Performance Engineer
📍 London (Onsite) | 🌍 Visa Sponsorship + Relocation
Join a frontier AI company backed by NVIDIA, building large-scale open-weight foundation models alongside researchers and engineers from DeepMind, OpenAI, Meta, Anthropic, and Google Brain.
⚡ What You’ll Do
* Optimise GPU performance and training efficiency across 1,000+ GPU clusters
* Improve utilisation, throughput, and reliability across distributed training infrastructure
* Build tooling for orchestration, monitoring, scheduling, and observability
* Work closely with research teams to accelerate large-scale model training
🔧 What They’re Looking For
* Deep GPU infrastructure / distributed systems experience
* Strong knowledge of CUDA, NCCL, PyTorch, DeepSpeed, JAX, Megatron-LM, vLLM, etc.
* Experience operating large-scale GPU clusters (1,000+ GPUs)
* Kubernetes, Slurm, or similar orchestration expertise
BONUS: Experience working on NVIDIA Blackwell chips (B200, B300, GB200, GB300)
đź’° Package
* Salary open to candidate expectations
* Meaningful startup equity
* Full visa sponsorship + relocation support