Building a living model of the world that people and machines can talk to. Powered by a proprietary database of over 30 billion posed images and a next-gen digital map, they are developing the spatial intelligence that helps humans and machines understand, navigate, and engage with the physical world.
As a Technical Anchor in their London R&D hub, you will bridge the gap between 3D computer vision and Vision-Language Models (VLMs), creating a unified framework where machines can reason about their surround.
What You’ll Doing:
* Architect Semantic Grounding: Lead research into cross-modal grounding connecting 3D spatial features with language embeddings.
* Scale "Understand" Capabilities: Develop algorithms for continuous semantics, allowing 3D maps to evolve and improve situational awareness.
* Agentic Frameworks: Build the "spatial brain" for Embodied AI, enabling robots, drones, and machines to move into mission-level reasoning.
* Multimodal Benchmarking: Define standards for measuring "spatial common sense" in LLMs/VLMs.
* Technical Mentorship: Act as the technical anchor for the London hub, guiding architecture and mentoring researchers.
What We Are Looking For:
* Education: PhD (or equivalent) in Computer Vision, Machine Learning, or Robotics focusing on Multimodal/Semantic understanding.
* Experience: 4+ years of ML research experience with a track record of shipping models bridging 3D Vision and Language.
* Technical Depth: Expert knowledge of 3D Geometry (SfM, SLAM, VPS) and Transformer-based architectures (VLMs).
* Research Impact: Multiple first-author publications at top-tier venues (CVPR, NeurIPS, ICLR).
* Code Mastery: Production-quality research code in PyTorch or JAX + large-scale data pipeline management.
* Location: Ability to work hybrid from their London office (3 days/week)
#J-18808-Ljbffr