Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Platform engineer - hpc, ai and ml

Slough
Cloud People
Platform engineer
Posted: 17 November
Offer description

Platform Engineer – HPC, AI and ML

Up to £80,000 plus benefits

Onsite – Kensington, London


Company and Role

This is an opportunity to join a global technology and AI solutions provider delivering some of the most advanced computing platforms in the world. You will play a leading role in the design, build and long-term support of a next generation AI and Machine Learning platform, built on cutting-edge High Performance Computing (HPC) infrastructure for one of the UK’s most prestigious research environments.


As a Platform Engineer – HPC, AI and ML, you will be responsible for building and optimising a high-performance platform purpose-built for AI, ML, LLM and Generative AI workloads. You will lead on architecture, deployment and performance tuning using technologies such as Kubernetes, NVIDIA Run:AI, Ubuntu, Weka NeuralMesh and HGX B200 GPU nodes.


Once live, you will take ownership of the platform’s operation and evolution, ensuring it delivers consistent world-class performance for advanced research workloads. It is a rare opportunity to build a complex HPC environment from the ground up and then own it, ensuring it continues to power the next generation of AI-driven innovation.


Why This Role Stands Out

• Be part of one of the UK’s most advanced AI and HPC platform projects

• Build and then support a world-class infrastructure enabling AI, ML, LLM and Generative AI research

• Collaborate with global technology leaders including NVIDIA, HPE, Canonical and Weka

• Onsite role in Kensington, London within a pioneering research and innovation environment

• Salary up to £80,000 with excellent opportunities for growth in HPC and AI infrastructure engineering


What You’ll Be Doing

• Designing, deploying and configuring a complete AI and ML operations platform within a large-scale HPC environment

• Installing and optimising Ubuntu (Canonical) across GPU and non-GPU compute nodes

• Implementing and managing Kubernetes for container orchestration and performance at scale

• Installing and configuring NVIDIA GPU Operator, Network Operator and Run:AI orchestration platform

• Integrating Run:AI with Kubernetes clusters to deliver scalable GPU utilisation

• Supporting deployment of HGX B200 GPU nodes (96 NVIDIA B200 GPUs) and associated infrastructure

• Managing Weka NeuralMesh distributed AI storage for high-speed data access and resilience

• Implementing CI/CD and MLOps pipelines using Argo Workflows, Jenkins and GitHub

• Monitoring platform performance using Zabbix, Prometheus and Grafana

• Integrating SAN and Infiniband networking to achieve high throughput and reliability

• Creating detailed documentation and performing knowledge transfer to operations teams

• Providing ongoing platform support, patching, troubleshooting and continuous improvement


What You’ll Bring

• Proven experience designing, deploying and supporting HPC or large-scale compute environments for AI and ML workloads

• Strong understanding of Ubuntu server administration, networking and performance tuning

• Hands-on experience with Kubernetes and GPU-enabled workloads

• Practical knowledge of NVIDIA GPU technologies, particularly GPU Operator and Run:AI

• Familiarity with distributed storage and AI data systems such as Weka NeuralMesh

• Experience with CI/CD and MLOps pipelines using Argo, Jenkins or GitHub

• Knowledge of HPC networking including SAN and Infiniband integration

• Strong troubleshooting and documentation skills with a collaborative mindset


Desirable Experience

• Certifications in Kubernetes, NVIDIA or HPC infrastructure technologies

• Experience in research, academic or scientific computing environments

• Understanding of AI and ML workflows, neural network training and large language models

• Familiarity with HPE, NVIDIA, Aarna and Digital Realty platforms


If you are passionate about building and operating large-scale computing environments and want to play a key role in delivering one of the UK’s most advanced HPC and AI platforms, this is your opportunity to shape the future of research and machine learning infrastructure.

Apply
Create E-mail Alert
Job alert activated
Saved
Save
Similar job
Platform engineer
Reading (Berkshire)
OpticoreIT
Platform engineer
Similar job
Platform engineer
Farnborough (Hampshire)
The Access Group
Platform engineer
Similar job
Dv cleared - security platform engineer
Farnborough (Hampshire)
The Talent Locker Ltd.
Platform engineer
€550 a month
See more jobs
Similar jobs
It jobs in Slough
jobs Slough
jobs Berkshire
jobs England
Home > Jobs > It jobs > Platform engineer jobs > Platform engineer jobs in Slough > Platform Engineer - HPC, AI and ML

About Jobijoba

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by Job Title
  • Jobs by Industry
  • Jobs by Company
  • Jobs by Location
  • Jobs by Keywords

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2025 Jobijoba - All Rights Reserved

Apply
Create E-mail Alert
Job alert activated
Saved
Save