 
        
        Overview
Work within our clients machine learning team to deploy and optimize models for applications like low-latency speech recognition and large language models (LLMs). Initial focus will be on improving our clients speech recognition model’s training pipeline on multi-GPU systems to boost performance and quality.
Responsibilities
 * Train and deploy state-of-the-art ML models.
 * Apply optimization techniques (distillation, pruning, quantization).
 * Enhance speech models with features such as diarization, multilingual support, and keyword boosting.
 * Optimize models for low-latency inference on accelerators.
 * Improve training workflows and GPU utilization.
 * Use data augmentation to improve performance.
 * Stay updated on ML research to guide strategy.
Requirements
 * Master’s or PhD in a relevant field with strong ML foundations.
 * Training ML models for production use.
 * PyTorch or TensorFlow.
 * Handling large datasets (multi-terabyte).
 * Familiarity with Linux, version control, and CI/CD systems.
 * Knowledge of model compression (e.g., reduced precision).
#J-18808-Ljbffr