Voice‑Swap is building the future of AI voice technology for the creative industries - with ethics, artist partnership, and cutting‑edge engineering at the core. We work directly with musicians, voice‑over artists, and media partners to develop ethically licensed, production‑grade AI voice models with uncompromising speaker likeness and perceptual quality.
We are now looking for a Senior Machine Learning Engineer (Speech AI) to help us push high‑fidelity speech synthesis and voice conversion systems to production scale. As an early‑stage, fast‑moving company, we value people who take ownership, move quickly, and are comfortable operating with both autonomy and responsibility.
Learn more at https://www.voice-swap.ai.
Role Description
This is a full‑time remote role for a Senior Machine Learning Engineer at Voice‑Swap.
You will:
* Implement neural speech synthesis models, prioritising speaker likeness and naturalness
* Write scripts for data preprocessing and model evaluation
* Work directly with clients on text‑to‑speech and/or voice conversion model projects
* Script and support professional voiceover data collection sessions
* Reimplement and adapt architectures from scientific papers into production‑ready systems
* Contribute to improving training efficiency and deployment performance
This role requires someone comfortable moving between research papers, GPU training runs, and production APIs.
Qualifications
* Solid understanding of the fundamental concepts of Machine Learning and Deep Learning (Transformers, CNNs, RNNs)
* Strong grounding in mathematics, audio signal processing, speech processing or NLP
* Experience with ML frameworks (PyTorch or TensorFlow)
* Experience training and deploying models on cloud services (AWS, GCP, etc.)
* Experience reimplementing architectures from scientific papers
* Comfortable with Git & GitHub workflows
* Strong software engineering discipline and attention to reproducibility
Bonus Skills
* Experience in speech synthesis (text‑to‑speech and/or voice conversion)
* Training and inference optimisation (e.g., quantisation techniques)
* MS or PhD in Computer Science or Machine Learning, or 3+ years of relevant experience
* Publications in top‑tier speech / NLP / signal processing conferences (Interspeech, ICASSP, ASRU, SLT, EUSIPCO, ACL, etc.)
* Music production or audio engineering experience
Who Thrives Here
* You enjoy working in a startup environment where priorities can evolve quickly
* You are proactive and don’t wait to be told what to do
* You are comfortable owning problems from research to production
* You care about audio quality and technical excellence
* You’re collaborative, reliable, and enjoyable to work with
Note: With your CV please provide a brief info of your proudest project (GitHub repo, arxiv paper link, short description).
#J-18808-Ljbffr