AI performance and efficiency is the major tech theme for the next decade. We are building systems that autonomously discover, test, and ship state-of-the-art GPU kernels. Our mission is to fully automate this process by combining LLMs with evolutionary methods. We just closed an unannounced $4.2M pre-seed round from top-tier funds and technical angels, and have proven results with large and sophisticated enterprise partners on custom neural architectures
We believe that revolutionary breakthroughs often happen at the intersections of fields. We are not a research lab, nor are we an AI agents company. We’re working at the intersection of LLMs and evolutionary computing to build self-improving systems. We’re looking for exceptionally talented engineers and researchers to join us on this epic quest
Responsibilities
* Write SOTA GPU kernels
* Own complex production ML/AI systems end-to-end
* Understand how kernel-level gains translate to wall-clock improvements in production.
* Build the infrastructure that lets LLM agents iterate unsupervised for days - compilation, correctness, benchmarking, scoring, lineage tracking...
* Design the evolutionary search - fitness landscapes, variation operators, population management, selection pressure, stagnation detection, exploration vs. exploitation over multi-day autonomous runs
* Communicate and share ideas through high-quality documentation, technical meet-ups and blogs
For lead candidates: Hire and mentor a small team of exceptional engineers and researchers
Qualifications
* You've written and shipped high-performance or SOTA CUDA kernels
* Deep understanding of mixed precision, quantisation (INT4, INT8, FP8, MXFP4, block-scaled formats), kernel fusion, distributed computing strategies (TP, PP, CP)
* You've made deliberate choices about tiling, memory access patterns, warp-level primitives, and instruction scheduling
* You've traced performance cliffs to their root cause through profiler output
* You've worked with CuTe, Triton, Helion or equivalent abstractions, and know when to dive into PTX
* You understand GPU architecture across generations — registers through L2, warp execution, divergence costs, occupancy tradeoffs, what changed between Hopper and Blackwell and why it matters
* You know transformers at the implementation level. Attention variants, KV cache strategies, quantisation schemes, and how they shape kernel design
* You've worked with production inference or training frameworks, vLLM, Megatron-LM, etc.
* You've built performance-critical infrastructure before - compilers, profilers, auto-tuners, or search systems
* You have real intuition for evolutionary methods, fitness landscapes, and what makes variation operators work on hard combinatorial problems
* You're familiar with new or esoteric technical methods such as Neural Algorithmic Reasoning, Geometric Deep Learning, Category Theory, Neuroevolution, Megakernels, or the work of François Chollet, Kenneth Stanley, Jeff Clune, Jurgen Schmidhuber, David Ha, and Christian Szegedy
Bonus
* Open-source kernel contributions (FlashAttention, FlashInfer, vLLM, Unsloth, Liger-Kernels, ThunderKittens)
* Publications in ML/AI, kernel optimisation or evolutionary methods (NeurIPS, ICLR, CVPR, GECCO or equivalent)
* Other HW experience (AMD, MLX, edge HW)
* Familiarity with TileLang, Helion, CuTile
* Experience building agentic systems
* Demonstrated work on KernelBench, Kaggle, GitHub, Blogs, StackOverflow Answers, or any public work that demonstrates deep EA, ML or GPU/HW expertise
* HPC experience
This is a full-time, permanent role. Competitive salary + significant founding equity. On-site/hybrid/remote flexible - Dublin, London, Paris or NYC preferred
If this sounds exciting to you, apply via the link below or send a pdf of your CV/résumé to jobs@geometric.so