Research Engineer (Data Infra/ML)
London (Hybrid)
Can you build & optimize distributed ML pipelines with Ray or Spark?
Do you love speeding up cloud infra (Kubernetes, Docker, CI/CD)?
Excited to build the data backbone for large-scale ML training?
We're a tier 1 VC-backed start-up, developing hyper-realistic 3D simulations using AI. Our customers include leading names in industries such as autonomous vehicles, drones and robotics.
Role
You’ll be hands-on improving CI/CD pipelines, speeding up Docker builds, and scaling scene processing on Ray. You’ll also:
* Build high-performance data pipelines for multimodal datasets (3D, video, sensor).
* Optimize distributed training and processing across Spark, Databricks, and Kubernetes.
* Work with researchers to productionize PyTorch models and streamline ML workflows.
* Develop tools that make data discoverable, reusable, and reliable throughout the ML lifecycle.
You
* Strong Python skills and experience with distributed systems (Ray, Spark, Flyte, Dask).
* Hands-on with cloud, Kubernetes, and distributed training (Ray, PyTorch DDP, Horovod).
* Familiar with dataset versioning and experiment tracking (DVC, MLflow).
Bonus Points
* Experience in simulation, robotics, or autonomy pipelines.
* Background in deep learning (PyTorch) and 3D / sensor data (LIDAR, meshes, radiance fields).
* Open-source contributions or frontend/UI experience.