Requirements
* 4+ years of experience in machine learning, bonus points for data‑centric approaches
* Experience with large multimodal datasets and generative models (video, image, or multimodal)
* Deep intuition for how data composition and quality translate to model capabilities
* Comfort working across the full research stack: data analysis, dataset creation, model training, evaluation, and back again
* Proficiency with at least one ML framework (e.g. PyTorch, JAX) and distributed compute tools (e.g. Ray, Kubernetes)
* Excitement about building AI that simulates the world
What the job involves
* We are looking for a Research Engineer to own the data behind our models: what they learn from, how well they learn it, and what new capabilities that unlocks
* You will design datasets, run modeling experiments, and build the infrastructure to generate and curate data at scale — directly shaping what our models can do, with applications ranging from creative tools to robotics
* Design multimodal, multitask datasets that teach world models new capabilities — deciding what data to collect, generate, or curate and measuring its effect on model behavior
* Run controlled training experiments to understand how data composition drives model performance across tasks and domains
* Build and operate large‑scale pipelines for synthetic data generation, filtering, and quality control
* Define evaluations and benchmarks that measure whether our models are actually improving at the things that matter
* Partner with product and creative teams to translate target behaviors and capabilities into concrete data strategies
J-18808-Ljbffr