Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Operations & support engineer (hpc)

Newport (Newport)
asobbi
Support engineer
Posted: 20h ago
Offer description

About the Company

A rapidly growing cloud provider is redefining high-performance computing with cutting-edge GPUaaS, delivering scalable, enterprise-grade AI infrastructure at unmatched efficiency. With deep ties to Nvidia, they’re quickly becoming a powerhouse in the US and Europe’s AI and ML ecosystem, providing solutions for HPC, AI, and deep learning workloads.


Role Overview

As the Principal HPC Support Engineer, you will play a pivotal role in maintaining and supporting high-performance computing environments on bare-metal infrastructure, primarily serving clients in research, higher education, and enterprise AI sectors. You will focus on both the software and networking aspects of HPC deployments, ensuring that large-scale GPU clusters remain operational, secure, and optimized for client needs.


Key Responsibilities


System Maintenance and Performance Optimization

• Manage, maintain, and tune bare-metal HPC clusters running Linux-based operating systems (e.g., Fedora, Debian, Ubuntu).

• Optimize Nvidia GPU compute environments, including CUDA, NCCL, and GPU resource management in multi-node HPC clusters.

• Oversee high-speed networking configurations, including InfiniBand (Mellanox), RDMA, and Ethernet fabric tuning for low-latency HPC workloads.

• Configure and fine-tune HPC schedulers (e.g., Slurm, OpenPBS, SGE) for optimal GPU workload distribution.

• Implement containerization strategies (Podman, Docker) and orchestration platforms (K3s, Kubernetes) for managing distributed AI/ML workloads.


Networking and Infrastructure Support

• Configure, monitor, and troubleshoot high-performance network fabrics, ensuring low-latency, high-throughput communication between GPU nodes.

• Deploy and maintain InfiniBand, RoCE, and high-speed Ethernet for HPC and AI clusters.

• Collaborate with networking teams to optimize routing, switching, and load balancing for distributed computing environments.

• Work closely with Nvidia engineers and system architects to implement GPUDirect Storage, NVLink, and Magnum IO for accelerated workloads.


Security, Automation, and Monitoring

• Maintain authentication and authorization systems such as Active Directory, OpenLDAP, and Keycloak.

• Automate system provisioning and configuration using Ansible, Terraform, or other Infrastructure-as-Code tools.

• Monitor system performance using Prometheus, Grafana, and ELK Stack, identifying and resolving bottlenecks in GPU workloads.

• Implement security best practices for multi-tenant HPC clusters, ensuring compliance with industry standards.


Troubleshooting and Client Support

• Serve as the lead technical resource for diagnosing and resolving complex software, networking, and hardware issues in large-scale GPU clusters.

• Analyze logs, conduct performance profiling, and debug CUDA, MPI, and RDMA-related issues.

• Work closely with AI/ML research teams, cloud engineers, and enterprise clients to optimize workload performance.


Collaboration and Process Improvement

• Support the ongoing development of internal HPC test environments and customer POCs.

• Work cross-functionally with Service Desk, Operations, and Service Delivery Management to ensure seamless service.

• Provide technical documentation, training, and mentorship to junior team members.

Apply
Create E-mail Alert
Job alert activated
Saved
Save
Similar job
Infrastructure support engineer
Blaenavon
Thrive Group
Support engineer
Similar job
Fleet support engineer - rail
Cardiff
Txm Recruit
Support engineer
Similar job
Mobile field support engineer - shrewsbury telford
Cowbridge
DXC Technology
Support engineer
€30,000 a year
See more jobs
Similar jobs
It jobs in Newport (Newport)
jobs Newport (Newport)
jobs Newport
jobs Wales
Home > Jobs > It jobs > Support engineer jobs > Support engineer jobs in Newport (Newport) > Operations & Support Engineer (HPC)

About Jobijoba

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by Job Title
  • Jobs by Industry
  • Jobs by Company
  • Jobs by Location
  • Jobs by Keywords

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2025 Jobijoba - All Rights Reserved

Apply
Create E-mail Alert
Job alert activated
Saved
Save