Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Senior ml infrastructure engineer - ai

Oxford
Permanent
Ellison Institute of Technology Oxford
Infrastructure engineer
Posted: 22 December
Offer description

The Ellison Institute of Technology (EIT) Oxford aims to have a global impact by fundamentally reimagining the way science and technology translate into end‑to‑end solutions and delivering these solutions through programmes and platforms that respond to humanity's most challenging problems. EIT Oxford will ensure scientific discoveries and pioneering science are turned into products that benefit society worldwide, with a long‑term vision of commercialising those solutions for sustainability. Led by a world‑class faculty of scientists, technologists, policy makers, economists and entrepreneurs, the EIT seeks to develop and deploy commercially sustainable solutions across its four Humane Endeavours: Health, Medical Science & Gen­erative Biology; Food Security & Sustainable Agriculture; Climate Change & Managing Atmospheric CO2; and Artificial Intelligence & Robotics. The campus, slated for completion in 2027, will span more than 300,000 sq ft of research laboratories, educational and gathering spaces, and will later expand into a 2 million sq ft campus across Oxford Science Park. Designed by Foster + Partners, it will host up to 7,000 people, featuring autonomous labs and purpose‑built facilities to spark interdisciplinary collaboration.


Requirements

Our MLOps team is building the cloud and compute foundation that enables scientific breakthroughs. We deliver reliable, secure platforms and self‑service guardrails that accelerate experimentation and turn ideas into results—faster, at scale, and with confidence.


Day-to-day, you might:

* Build, operate, and continuously optimise our high‑performance GPU training and inference clusters, focusing on robust, high‑availability scheduling, isolation, and automated lifecycle management.
* Drive systems design and implementation for high‑throughput data paths, optimising I/O, caching, and data locality across compute and storage (including our current Lustre implementation).
* Proactively benchmark, profile, and resolve performance bottlenecks across the compute, network, and orchestration layers to maximise efficiency for distributed training and inference.
* Establish comprehensive observability, resilience, and automated security controls to ensure compliance and robust operation of sensitive research environments.
* Partner with Research, Data, and Applied teams to forecast capacity and cost for GPU and storage needs, setting quotas and streamlining ML experimentation pipelines.


What makes you a great fit:

* Proven experience leading the design, build, and operation of high‑performance ML compute clusters at scale.
* A proactive, autonomous approach to systems design and the proven ability and desire to ideate, co‑create and implement optimal solutions.
* Exposure to migrating or transforming ML infrastructure from traditional schedulers to modern, containerised systems.
* Expertise with high‑throughput storage systems for ML/HPC workloads.
* Expert‑level understanding of GPU architecture, high‑speed networking for distributed training, and performance profiling to resolve bottlenecks.
* A solid grasp of IaC and CI/CD practices (e.g., Terraform, Argo CD).


Benefits

* Enhanced holiday pay
* Pension
* Life Assurance
* Income Protection
* Private Medical Insurance
* Hospital Cash Plan
* Therapy Services
* Perk Box
* Electric Car Scheme


Why work for EIT

At the Ellison Institute, we believe a collaborative, inclusive team is key to our success. We are building a supportive environment where creative risks are encouraged and everyone feels heard. Valuing emotional intelligence, empathy, respect, and resilience, we encourage people to be curious and to share a commitment to excellence. Join us and make an impact!

#J-18808-Ljbffr

Apply
Create E-mail Alert
Job alert activated
Saved
Save
Similar job
Cloud infrastructure engineer
Wantage
Permanent
Motorsport Network
Infrastructure engineer
€70,000 a year
Similar job
Cloud infrastructure engineer - multi-cloud & iac expert
Grove
Permanent
Williams F1 Group
Infrastructure engineer
€70,000 a year
Similar job
Cloud infrastructure engineer
Grove
Permanent
Williams F1 Group
Infrastructure engineer
€70,000 a year
See more jobs
Similar jobs
Engineering jobs in Oxford
jobs Oxford
jobs Oxfordshire
jobs England
Home > Jobs > Engineering jobs > Infrastructure engineer jobs > Infrastructure engineer jobs in Oxford > Senior ML Infrastructure Engineer - AI

About Jobijoba

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by Job Title
  • Jobs by Industry
  • Jobs by Company
  • Jobs by Location
  • Jobs by Keywords

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2025 Jobijoba - All Rights Reserved

Apply
Create E-mail Alert
Job alert activated
Saved
Save