Company Description
Voice-Swap is building the future of AI voice technology for the creative industries - with ethics, artist partnership, and cutting-edge engineering at the core. We work directly with musicians, voice-over artists, and media partners to develop ethically licensed, production-grade AI voice models with uncompromising speaker likeness and perceptual quality.
We are now looking for a Senior Machine Learning Engineer (Speech AI) to help us push high-fidelity speech synthesis and voice conversion systems to production scale. As an early-stage, fast-moving company, we value people who take ownership, move quickly, and are comfortable operating with both autonomy and responsibility.
Learn more at https://www.voice-swap.ai.
Role Description
This is a full-time remote role for a Senior Machine Learning Engineer at Voice-Swap.
You will:
* Implement neural speech synthesis models, prioritising speaker likeness and naturalness
* Write model inference API scripts for product deployment
* Write scripts for data preprocessing and model evaluation
* Work directly with clients on text-to-speech and/or voice conversion model projects
* Script and support professional voiceover data collection sessions
* Reimplement and adapt architectures from scientific papers into production-ready systems
* Contribute to improving training efficiency and deployment performance
This role requires someone comfortable moving between research papers, GPU training runs, and production APIs.
Qualifications
* Solid understanding of the fundamental concepts of Machine Learning and Deep Learning (Transformers, CNNs, RNNs)
* Strong grounding in mathematics, audio signal processing, speech processing, or NLP
* Experience with ML frameworks (PyTorch or TensorFlow)
* Experience training and deploying models on cloud services (AWS, GCP, etc.)
* Experience reimplementing architectures from scientific papers
* Comfortable with Git & GitHub workflows
* Strong software engineering discipline and attention to reproducibility
Bonus Skills
* Experience in speech synthesis (text-to-speech and/or voice conversion)
* Training and inference optimisation (e.g., quantisation techniques)
* MS or PhD in Computer Science or Machine Learning, or 3+ years of relevant experience
* Publications in top-tier speech / NLP / signal processing conferences (Interspeech, ICASSP, ASRU, SLT, EUSIPCO, ACL, etc.)
* Music production or audio engineering experience
Who Thrives Here
* You enjoy working in a startup environment where priorities can evolve quickly
* You are proactive and don’t wait to be told what to do
* You are comfortable owning problems from research to production
* You care about audio quality and technical excellence
* You’re collaborative, reliable, and enjoyable to work with
Note: With your CV please provide a brief info of your proudest project (GitHub repo, arxiv paper link, short description).