Senior Machine Learning Applications and Compiler Engineer page is loaded## Senior Machine Learning Applications and Compiler Engineerlocations: UK, Cambridge: UK, Remotetime type: Full timeposted on: Posted Todayjob requisition id: JR2013261NVIDIA is seeking engineers to develop algorithms and optimizations for our inference and compiler stack. You will work at the intersection of large-scale systems, compilers, and deep learning, crafting how neural network workloads map onto future NVIDIA platforms. This is your chance to be part of something outstandingly innovative! **What you’ll be doing:*** Build, develop, and maintain high-performance runtime and compiler components, focusing on end-to-end inference optimization.* Define and implement mappings of large-scale inference workloads onto NVIDIA’s systems.* Extend and integrate with NVIDIA’s SW ecosystem, contributing to libraries, tooling, and interfaces that enable seamless deployment of models across platforms.* Benchmark, profile, and monitor key performance and efficiency metrics to ensure the compiler generates efficient mappings of neural network graphs to our inference hardware.* Collaborate closely with hardware architects and design teams to feedback software observations, influence future architectures, and codesign features that unlock new performance and efficiency points.* Prototype and evaluate new compilation and runtime techniques, including graph transformations, scheduling strategies, and memory/layout optimizations tailored to spatial processors.* Publish and present technical work on novel compilation approaches for inference and related spatial accelerators at top tier ML, compiler, and computer architecture venues. **What we need to see:*** MS or PhD in Computer Science, Electrical/Computer Engineering, or related field, or equivalent experience, with 5 years of relevant experience.* Strong software engineering background with proficiency in systems level programming (e.g., C/C++ and/or Rust) and solid CS fundamentals in data structures, algorithms, and concurrency.* Hands on experience with compiler or runtime development, including IR design, optimization passes, or code generation.* Experience with LLVM and/or MLIR, including building custom passes, dialects, or integrations.* Familiarity with deep learning frameworks such as TensorFlow and PyTorch, and experience working with portable graph formats such as ONNX.* Solid understanding of parallel and heterogeneous compute architectures, such as GPUs, spatial accelerators, or other domain specific processors.* Strong analytical and debugging skills, with experience using profiling, tracing, and benchmarking tools to drive performance improvements.* Excellent communication and collaboration skills, with the ability to work across hardware, systems, and software teams.* Ideal candidates will have direct experience with MLIR based compilers or other multilevel IR stacks, especially in the context of graph based deep learning workloads. **Ways to stand out from the crowd:*** Prior work on spatial or dataflow architectures, including static scheduling, pipeline parallelism, or tensor parallelism at scale.* Contributions to opensource ML frameworks, compilers, or runtime systems, particularly in areas related to performance or scalability.* Demonstrated research impact, such as publications or presentations at conferences like PLDI, CGO, ASPLOS, ISCA, MICRO, MLSys, NeurIPS, or similar.* Experience with large-scale AI distributed inference or training systems, including performance modeling and capacity planning for multi rack deployments. #LI-Hybrid
#J-18808-Ljbffr