We are looking for an excellent ML Ops Engineer to join our research and development team.
Key Responsibilities
This opportunity is to join the ML Operations teams which supports the ML Development team in building leading-edge motion capture products through provisioning and maintaining a modern ML Operations stack.
This stack covers data acquisition pipelines, data management and ML model training infrastructure (SW and on-prem HW). We use both on-prem, self-managed systems and also leverage AWS infrastructure.
You will have opportunities to guide the technical direction of the ML Ops team, suggest new areas of development and the potential to lead your own project.
Required Skills, Knowledge and Expertise
You will have relevant academic (research Masters level) and/or industry experience.
* Excellent knowledge and experience of managing an on-premise Kubenetes cluster.
* Excellent knowledge of Kubeflow and similar systems, e.g. MLflow
* Good programming ability in Python with familiarity with Linux systems including scripting and system configuration.
* Experience using AWS, e.g, Cognito, S3, EC2, Lamdas, etc.
* Experience with ML toolkits, e.g. PyTorch, Lightning, etc., along with a solid understanding of how these fit into ML Ops pipelines and tools.
* Be able to design and implement MLOps solutions covering many different technologies.
Desirable Skills
* Background in DevOps with exposure to CI systems, e.g. Jenkins
* Familiarity with infrastructure as code, e.g. Ansible
* Experience, aptitude, and a desire to work with human motion, sport, animation tools and techniques.
* Familiarity with C