Job Description
HPC Engineer
We are seeking an experienced High Performance Computing (HPC) Engineer to design, maintain, and optimise large-scale computing environments that support data-intensive and compute-heavy workloads. You will work closely with researchers, developers, and infrastructure teams to ensure high availability, performance, and scalability of HPC systems.
Key Responsibilities
* Design, deploy, and manage HPC clusters (on-prem, cloud, or hybrid)
* Install, configure, and optimise job schedulers (e.g. Slurm, PBS, LSF)
* Tune system performance for CPU, GPU, memory, storage, and network workloads
* Support users with application optimisation and parallelisation
* Automate system administration using scripting and configuration management tools
* Monitor system health, capacity, and performance
* Troubleshoot hardware, software, and performance issues
* Collaborate on future architecture planning and upgrades
* Maintain documentation and best practices
Required Skills & Experience
1. Strong Linux system administration experience
2. Hands-on experience with HPC environments and parallel computing
3. Knowledge of MPI, OpenMP, and/or CUDA
4. Experience with job schedulers (Slurm preferred)
5. Familiarity with high-speed interconnects (InfiniBand, Omn...