Overview
Senior Machine Learning Operations Engineer London (2x a week onsite) - £500 p/d (Outside IR35) 6-months Contract
Responsibilities
* Evolve and scale the machine learning platform to support high-throughput model inference and fast iteration cycles.
* You will work closely with ML engineers and product teams to align infrastructure with evolving project needs, research and implement cutting-edge MLOps practices, and mentor colleagues by sharing expertise in cloud operations and ML engineering best practices.
* Manage GPU-powered Kubernetes clusters, improve automation pipelines, and ensure system reliability. Build and manage Kubernetes clusters from scratch, configuring them manually using tools like kubeadm, and deploy applications with Helm.
Qualifications
* MLOps & Kubernetes: GPU-enabled cluster management, built from scratch using kubeadm and Helm.
* Programming: Python or Go for ML automation workflows.
* Containerization: Docker and containerized application deployment.
* Cloud: AWS experience supporting ML workloads.
* CI/CD & Automation: ArgoCD, GitHub Actions, Infrastructure-as-Code (Terraform).
* Monitoring & Observability: Prometheus, Grafana, cloud-native stacks.
* ML Lifecycle: Production experience with experimentation, training, deployment, versioning, and monitoring.
* Reliability & Support: On-call participation, incident response, and system optimization.
Details
* Location: London (2x a week onsite)
* Day rate: £500 p/d (Outside IR35)
* Duration: 6-month
#J-18808-Ljbffr