Overview
Work within our clients machine learning team to deploy and optimize models for applications like low:latency speech recognition and large language models (LLMs). Initial focus will be on improving our clients speech recognition model's training pipeline on multi:GPU systems to boost performance and quality.
Responsibilities:
:Train and deploy state:of:the:art ML models.
:Apply optimization techniques (distillation, pruning, quantization).
:Enhance speech models with features such as diarization, multilingual support, and keyword boosting.
:Optimize models for low:latency inference on accelerators.
:Improve training workflows and GPU utilization.
:Use data augmentation to improve performance.
:Stay updated on ML research to guide strategy.
Requirements:
:Master's or PhD in a relevant field with strong ML foundations.
:Training ML models for production use.
:PyTorch or TensorFlow.
:Handling large datasets (multi:terabyte).
:Familiarity with Linux, version control, and CI/CD systems.
:Knowledge of model compression (e.g., reduced precision).