Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Operations & support engineer (hpc)

Guernsey
asobbi
Support engineer
Posted: 15h ago
Offer description

About the Company

A rapidly growing cloud provider is redefining high-performance computing with cutting-edge GPUaaS, delivering scalable, enterprise-grade AI infrastructure at unmatched efficiency. With deep ties to Nvidia, theyre quickly becoming a powerhouse in the US and Europes AI and ML ecosystem, providing solutions for HPC, AI, and deep learning workloads.

Role Overview

As the Principal HPC Support Engineer, you will play a pivotal role in maintaining and supporting high-performance computing environments on bare-metal infrastructure, primarily serving clients in research, higher education, and enterprise AI sectors. You will focus on both the software and networking aspects of HPC deployments, ensuring that large-scale GPU clusters remain operational, secure, and optimized for client needs.

Key Responsibilities

System Maintenance and Performance Optimization

Manage, maintain, and tune bare-metal HPC clusters running Linux-based operating systems (e.g., Fedora, Debian, Ubuntu).

Optimize Nvidia GPU compute environments, including CUDA, NCCL, and GPU resource management in multi-node HPC clusters.

Oversee high-speed networking configurations, including InfiniBand (Mellanox), RDMA, and Ethernet fabric tuning for low-latency HPC workloads.

Configure and fine-tune HPC schedulers (e.g., Slurm, OpenPBS, SGE) for optimal GPU workload distribution.

Implement containerization strategies (Podman, Docker) and orchestration platforms (K3s, Kubernetes) for managing distributed AI/ML workloads.

Networking and Infrastructure Support

Configure, monitor, and troubleshoot high-performance network fabrics, ensuring low-latency, high-throughput communication between GPU nodes.

Deploy and maintain InfiniBand, RoCE, and high-speed Ethernet for HPC and AI clusters.

Collaborate with networking teams to optimize routing, switching, and load balancing for distributed computing environments.

Work closely with Nvidia engineers and system architects to implement GPUDirect Storage, NVLink, and Magnum IO for accelerated workloads.

Security, Automation, and Monitoring

Maintain authentication and authorization systems such as Active Directory, OpenLDAP, and Keycloak.

Automate system provisioning and configuration using Ansible, Terraform, or other Infrastructure-as-Code tools.

Monitor system performance using Prometheus, Grafana, and ELK Stack, identifying and resolving bottlenecks in GPU workloads.

Implement security best practices for multi-tenant HPC clusters, ensuring compliance with industry standards.

Troubleshooting and Client Support

Serve as the lead technical resource for diagnosing and resolving complex software, networking, and hardware issues in large-scale GPU clusters.

Analyze logs, conduct performance profiling, and debug CUDA, MPI, and RDMA-related issues.

Work closely with AI/ML research teams, cloud engineers, and enterprise clients to optimize workload performance.

Collaboration and Process Improvement

Support the ongoing development of internal HPC test environments and customer POCs.

Work cross-functionally with Service Desk, Operations, and Service Delivery Management to ensure seamless service.

Provide technical documentation, training, and mentorship to junior team members.

Apply
Create E-mail Alert
Job alert activated
Saved
Save
Similar job
Support engineer (remote)
Guernsey
Formula Recruitment
Support engineer
Similar job
Power platform support engineer
Guernsey
83zero
Support engineer
Similar job
Support engineer
St Peter Port
Formula Recruitment
Support engineer
See more jobs
Similar jobs
It jobs in Guernsey
jobs Guernsey
jobs Guernsey
jobs Channel Islands
Home > Jobs > It jobs > Support engineer jobs > Support engineer jobs in Guernsey > Operations & Support Engineer (HPC)

About Jobijoba

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by Job Title
  • Jobs by Industry
  • Jobs by Company
  • Jobs by Location
  • Jobs by Keywords

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2025 Jobijoba - All Rights Reserved

Apply
Create E-mail Alert
Job alert activated
Saved
Save