Platform Engineer
Openings X3
Location: London (Hybrid)
Employment Type: Full-time
The Opportunity
We are building a next-generation computational platform that powers large-scale machine learning, data science, and scientific discovery. Our teams work at the intersection of cloud infrastructure, high-performance computing, and data engineering, enabling researchers and ML practitioners to move faster—from experimentation to real-world impact.
This role sits at the heart of the platform: designing, scaling, and operating systems that support GPU-accelerated workloads, batch pipelines, and data-intensive applications.
Who This Role Is For (Choose Your Strength)
We’re open to different profiles and will shape the role around your strengths:
🔹 AI Platform / ML Infrastructure Engineers
* Kubernetes-based compute platforms
* GPU scheduling, batch & distributed workloads
* Supporting ML training, inference, and experimentation at scale
🔹 HPC / GPU Engineers
* Job schedulers, MPI, multi-node workloads
* Hybrid cloud and on-prem compute
* Performance, reliability, and cost optimisation
🔹 Strong Data Engineers
* Large-scale data pipelines and data platforms
* Data reliability, orchestration, and observability
* Close collaboration with ML and research teams
What You’ll Work On
* Designing and evolving Kubernetes-based compute platforms across hybrid and multi-cloud environments
* Building and operating GPU-enabled infrastructure for ML and scientific workloads
* Developing and maintaining core platform services, APIs, and internal tooling
* Improving CI/CD pipelines and Infrastructure-as-Code workflows
* Implementing monitoring, alerting, and reliability engineering practices
* Ensuring security, data protection, backup, and disaster recovery best practices
* Partnering closely with ML engineers, data scientists, and researchers to unblock compute and data challenges
What We’re Looking For
* Strong experience in one or more of:
* Platform / infrastructure engineering
* ML infrastructure or MLOps
* HPC or GPU compute
* Data engineering at scale
* Solid experience with Linux and cloud environments
* Hands-on work with Kubernetes or distributed systems
* Experience with Python (or similar) for automation or services
* Familiarity with CI/CD, Git-based workflows, and automation
* Strong problem-solving skills and a collaborative mindset
Bonus
* Terraform or other IaC tools
* Slurm, Kueue, Ray, Spark, or similar systems
* GPU tooling (CUDA, Nvidia operators, schedulers)
* Experience supporting ML training or data science teams