A fast-growing, AI-first company is transforming how people create visual content—powering popular apps used by millions, and helping creators and brands grow through cutting-edge technology.
We're looking for an experienced Infrastructure / ML Platform Engineer to join our Machine Learning Platform team. This team builds and supports the platform that powers advanced AI models, helping bring research into production at scale.
📍 Hybrid role – 3 days onsite in Central London
What you’ll do:
* Design, build, and maintain a scalable and reliable ML serving platform
* Develop cloud infrastructure and internal tools to support research and engineering teams
* Set up and manage CI/CD pipelines and monitoring systems
* Build self-serve tools to simplify deployment and development
* Share best practices across teams and help level up the platform
* Take part in an on-call rotation (weekends included, with extra pay)
What we’re looking for:
* 5+ years of experience running scalable SaaS systems in GCP or AWS, or Azure
* 3+ years with Kubernetes, Helm/Kustomize, and tools like Terraform or Pulumi
* Experience with microservices, containerized environments, and GitOps (e.g. ArgoCD)
* Familiarity with CI/CD tools like GitHub Actions, Jenkins, or CircleCI
* Hands-on with monitoring tools like Prometheus and Grafana
Nice to have:
* Experience building Developer Experience (DevX) tools and workflows
* Familiarity with GPU setups (CUDA, TensorFlow, etc.)
* Strong networking and network security knowledge
* Linux/Unix skills and shell scripting
* A degree in Computer Science or a related field
✨ What we offer:
* Hybrid work – 3 days onsite in a vibrant Central London office
* Three-stage interview process – straightforward and transparent
* Competitive salary and benefits
* Work on real-world AI challenges with smart, passionate people