Job Description
Machine Learning Engineer – Multimodal LLMs (Speech Focus)
About the Role
ConnexAI is developing a transformative product that enables speech-to-speech capabilities in large language models. This is a greenfield project with significant scope to influence both its technical architecture and product impact from the ground up.
We’re looking for a hands-on Machine Learning Engineer with deep expertise in building, optimising, and deploying ML systems — particularly in the areas of speech, LLMs, or multimodal learning. You will take cutting-edge research and turn it into production-ready models, enabling real-time, scalable, and reliable multimodal AI experiences.
What You'll Be Doing
* Building and productising machine learning models for speech-to-text, text-to-speech, and speech-to-speech tasks
* Translating academic and internal research into scalable, maintainable code and services
* Developing and maintaining training pipelines, inference services, and deployment workflows
* Implementing robust data pipelines for sourcing, preprocessing, and versioning multimodal datasets
* Collaborating with research scientists to refine model architectures and integrate the latest techniques into production
* Evaluating model performance with custom metrics and developing automated test frameworks for ML systems
* Contributing to MLOps tooling and infrastructure to support model lifecycle management and monitoring in production
* Working closely with product, research, and backend engineering to deliver seamless end-to-end features
What We're Looking For
* Strong engineering background with experience shipping ML systems to production
* Deep familiarity with speech technologies (ASR, TTS), LLMs, or multimodal machine learning
* Proficient in Python, with expertise in ML frameworks such as PyTorch
* Experience building scalable ML pipelines (training, validation, deployment, monitoring)
* Knowledge of Docker, Kubernetes, and ML deployment platforms
* Comfort reading and adapting recent research papers into performant implementations
* Strong debugging and optimisation skills, particularly around model inference speed and resource usage
* Experience working in cross-functional teams and contributing to engineering culture and best practices
* Bonus: experience with streaming audio processing, real-time systems, or speech synthesis engines
Why Join Us?
* Be part of a foundational team building novel, multimodal AI capabilities
* Shape the architecture and product direction from an early stage
* Work in a fast-moving, collaborative environment with a strong focus on execution and innovation
* Opportunity to grow alongside a rapidly scaling AI startup