About
Early-stage robotics and foundational AI company building universal robotics foundation models for general-purpose mobile robots. The team focuses on simulation-first data generation, proprietary multimodal models, tight hardware integration, and large-scale training systems powering real-world robotics.
You will join the team responsible for the core model training stack.
What you'll do
- Build and optimise training infrastructure for large-scale vision-language and multimodal foundation models
- Design systems for long-context video training, including sequence parallelism at scale
- Support autoregressive and diffusion-based models for actions and video
- Implement sampling during training (self-forcing) to reduce distribution drift
- Enable RL post-training for multimodal models
- Own data flow, memory movement, and GPU utilisation across complex training loops
What you'll need
- Extensive experience in ML infrastructure, distributed systems, or high-performance computing
- Direct experience training large vision-language or multimodal foundation models
- Strong background in large-scale distributed training and GPU performance tuning
- Experience from top AI labs, frontier model teams, or elite infrastructure groups
Shortlisted candidates will be contacted within 48 hours.