Machine Learning Performance Engineer
We are seeking a highly skilled Machine Learning Performance Engineer to join our team. This is an exciting opportunity for a talented individual to work on cutting-edge AI projects and push the boundaries of what is possible.
Key Responsibilities
* Design and implement high-performance machine learning models using Python, C++, or Rust.
* Optimize GPU performance engineering, including CUDA, PTX/SASS, Tensor Cores, memory hierarchy, and warp-level primitives.
* Collaborate with research and infrastructure teams to develop efficient training methods and deployment strategies.
Requirements
Technical Experience
* Strong background in computer science and software engineering.
* Familiarity with ML frameworks like PyTorch and their internals.
* Proficiency in profiling and debugging tools like NSight, CUDA GDB, nvprof, and NSight Compute.
* Experience with distributed systems and HPC, including Infiniband, NVLink, RoCE, GPUDirect, NCCL, and MPI.
Your Mindset
* A hacker's curiosity: you love breaking things down and figuring out how to make them faster.
* Product intuition: performance isn't abstract to you, it's about real-world impact.
* Collaborative spirit: you're excited to work across research, infra, and open-source teams.
About Us
We value diversity and inclusivity in the workplace. We welcome applicants from all backgrounds and encourage transparency and high-integrity work.