CUDA/GPU Systems Engineer (ML Infra)
Remote
We’re partnering with a well-funded AI company building advanced multi-agent systems. They’re looking for engineers with experience working close to the GPU layer.
This is a CUDA-heavy role focused on improving the performance of large-scale machine learning systems.
You’ll be solving problems like:
* Optimising CUDA kernels for ML workloads
* Improving GPU utilisation and memory efficiency
* Profiling and debugging GPU bottlenecks
* Accelerating training and inference pipelines
* Working alongside ML researchers building multi-agent AI systems
Essentials:
* Strong CUDA / GPU programming experience
* Experience writing or optimising CUDA kernels
* Strong C++ and/or Python
* Experience working with machine learning systems
* Experience could come from LLMs, computer vision, speech, recommendation systems, or other ML domains.
Nice to have:
* PyTorch / TensorFlow internals
* Distributed training (NCCL, DeepSpeed, Megatron, Horovod, etc)
* GPU performance engineering or HPC background