Job Description Looking to push the boundaries of generative AI for real-time interaction? You'll be joining a well-funded startup working on Multimodal AI where voice, vision, and language come together. They're building generative models for natural conversational experiences that need to perform in real-time. Your mission You'll be building and optimising diffusion or flow-matching models that power their speech and audio generation. This means developing production-ready architectures that can generate controllable, high-quality output at scale. You'll own the full research-to-production pipeline - from architecture design and training through deployment and optimisation. Your work will directly impact how millions of AI characters sound and interact. Your focus Design and train large-scale diffusion or flow-matching models Develop novel architectures and training techniques to improve controllability and quality Build evaluation systems to measure generation quality and model behaviour Work from low-level performance optimisations to high-level model design What you'll bring Proven track record building diffusion models or flow-matching systems Experience training large models (3B parameters) with distributed systems Nice to have Experience with audio or speech generation Publications or open-source contributions in diffusion models or generative AI Remote in Europe with competitive comp stock.