Overview
Work within our clients machine learning team to deploy and optimize models for applications like low-latency speech recognition and large language models (LLMs). Initial focus will be on improving our clients speech recognition model’s training pipeline on multi-GPU systems to boost performance and quality.
Hybrid working, 3 days onsite and 2 days WFH.
No Sponsorship, must have the right to work in the UK.
Responsibilities:
* Train and deploy state-of-the-art ML models.
* Apply optimization techniques (distillation, pruning, quantization).
* Enhance speech models with features such as diarization, multilingual support, and keyword boosting.
* Optimize models for low-latency inference on accelerators.
* Improve training workflows and GPU utilization.
* Use data augmentation to improve performance.
* Stay updated on ML research to guide strategy.
Requirements:
* Master’s or PhD in a relevant field with strong ML foundations.
* Training ML models for production use.
* PyTorch or TensorFlow.
* Handling large datasets (multi-terabyte).
* Familiarity with Linux, version control, and CI/CD systems.
* Knowledge of model compression (e.g., reduced precision).
#J-18808-Ljbffr