Job Description
HPC Engineer
We are seeking an experienced High Performance Computing (HPC) Engineer to design, maintain, and optimise large-scale computing environments that support data-intensive and compute-heavy workloads. You will work closely with researchers, developers, and infrastructure teams to ensure high availability, performance, and scalability of HPC systems.
Key Responsibilities
* Design, deploy, and manage HPC clusters (on-prem, cloud, or hybrid)
* Install, configure, and optimise job schedulers (e.g. Slurm, PBS, LSF)
* Tune system performance for CPU, GPU, memory, storage, and network workloads
* Support users with application optimisation and parallelisation
* Automate system administration using scripting and configuration management tools
* Monitor system health, capacity, and performance
* Troubleshoot hardware, software, and performance issues
* Collaborate on future architecture planning and upgrades
* Maintain documentation and best practices
Required Skills & Experience
* Strong Linux system administration experience
* Hands-on experience with HPC environments and parallel computing
* Knowledge of MPI, OpenMP, and/or CUDA
* Experience with job schedulers (Slurm preferred)
* Familiarity with high-speed interconnects (InfiniBand, Omni-Path)
* Experience with scripting languages (Bash, Python)
* Understanding of performance profiling and optimisation techniques
Desirable Skills
* Experience with GPUs and accelerator-based systems
* Knowledge of cloud HPC (AWS, Azure, GCP)
* Experience with containers (Singularity/Apptainer, Docker)
* Configuration management tools (Ansible, Puppet, Chef)
* Experience supporting scientific or research workloads
JBRP1_UKTJ