Role: Senior SRE
Skills: Deep Linux, Scripting - Python, DevOps, Kubernetes
Salary: £500k Plus
Location: London
The ideal candidate comes from a top-tier tech environment (FAANG, elite trading, hyperscale infra). They have experience building technology 0→1, owning systems end-to-end, and working close to the metal. They will operate across everything from bare-metal Linux to modern build and observability stacks.
Overview
Join a core engineering group as Lead Site Reliability Engineer, designing and scaling Linux platforms that underpin ML/AI-driven trading. You will architect and own reliability for massive simulation, HPC, and production workloads—ensuring ultra-reliable, ultra-fast trading systems. This is a hands-on, leadership role focused equally on technical depth, strategic decision-making, and driving platform SRE excellence.
Key Responsibilities
* Lead SRE practices for Linux platforms powering low-latency, high-throughput trading workloads.
* Architect, optimize, and tune Linux for performance, resilience, and minimal latency.
* Drive incident response, root cause analysis, and continuous reliability improvement across production systems.
* Oversee system automation and reproducibility—build, deploy, and fleet-manage bare-metal Linux and containerized stacks.
* Manage and enhance Kubernetes clusters, network configuration, and large-scale orchestration.
* Set observability standards; expand monitoring, alerting, and performance metrics across platforms.
* Analyze networking, kernel-level performance, and distributed systems—solving core challenges in a multi-petabyte, multi-cluster environment.
* Build Python tools for automation, reliability engineering, and performance analysis.
* Design highly distributed systems
What You Will Work On
* Ultra-reliable, high-performance trading infrastructure where every engineering optimization affects performance
* Next-generation simulation and HPC compute pipelines, supporting ML/AI workflows at scale.
* Integration and continuous improvement of internal and open-source tools for automation and reliability.
* Strategic platform direction: shaping foundational systems for critical infrastructure in an elite trading environment.
Team and Culture
* Small, autonomous Linux SRE team with direct ownership and impact.
* Collaborative engagement with quants, researchers, and trading experts to deliver robust platforms.
* A culture built on deep technical ownership, learning, and high standards of performance engineering
Apply now for an informal confidential chat!