HPC Engineer - Contract via Umbrella - Cambridge/Hybrid
Location: Cambridge, hybrid (ideal 3 days onsite)
Market rate
Description
We're looking for an HPC Engineer to join our team in the United Kingdom in a hybrid working mode (ideal 3 days onsite). In this role, you will help build and operate industry-leading high-performance computing (HPC) capabilities, including application build frameworks, containerized applications and cloud-based services. You will work closely with the scientific community to deliver high-quality HPC services, leveraging automation, infrastructure-as-code and DevOps practices to ensure scalability, reliability and performance in a rapidly evolving HPC landscape.
Responsibilities
* Design, implement and maintain robust platform infrastructure using Infrastructure as Code (IaC) tools such as Terraform
* Develop, deliver and operate research computing services and applications
* Take a Site Reliability Engineering approach to HPC services, managing development, deployment, monitoring and incident response end-to-end
* Solve complex technical problems related to HPC services and user workflows
* Drive innovative computational solutions and exploit emerging technologies
* Administer large-scale cluster and server computing environments and related software (eg, Slurm, LSF, Grid Engine)
* Apply DevOps practices and agile methodologies for HPC operations
* Manage virtualized private cloud resources (eg, OpenStack)
* Implement and administer large-scale parallel filesystems (eg, Weka, GPFS, Lustre)
* Use configuration management tools (eg, Ansible, Salt, Puppet) for IT operations
* Develop scripts and tools for HPC and DevOps operations using Bash and Python
Requirements
* 10+ years of experience operating or engineering large-scale computing environments (HPC, HTC or BC)
* Strong understanding of Linux system administration, TCP/IP stack and storage subsystems
* Experience with high-speed networks (eg, InfiniBand)
* Proven experience with configuration management and automation frameworks
* Hands-on experience with DevOps processes and agile methodologies
* Drive innovative computational solutions and exploit emerging technologies
* Experience in developing and managing relationships with third-party suppliers
* Scientific degree and/or experience in computationally intensive scientific data analysis
* Previous experience in large-scale HPC environments (>10,000 cores)
Additional
* Experience with public cloud infrastructure (AWS, Azure, GCP)
* Experience managing virtualized private cloud environments (eg, OpenStack)
* Familiarity with container technologies (LXD, Singularity, Docker, Kubernetes)
* Development experience with programming languages and tools (Java/C++, Python/Ruby/Perl, SQL)
* Experience with HashiCorp tools (Terraform, Vault, Consul, Nomad)