About
Early-stage robotics and foundational AI company building universal foundation models for general-purpose mobile robots. The company takes a vertical approach including simulation-first data, proprietary multimodal models, tight hardware integration, and long-term custom silicon development.
You will join the team responsible for the inference stack that runs models in real-world robotic systems.
What you'll do
- Build low-latency inference pipelines for on-device deployment
- Design and optimise distributed GPU inference systems
- Integrate low-level performance code (CUDA, Triton, custom kernels) into high-level ML frameworks
- Optimise for throughput and latency across batching, scheduling, quantisation, caching, and memory management
- Develop monitoring and debugging tooling for determinism and reliability
- Tune hardware–software interactions for latency-critical robotics environments
What you'll need
- Extensive experience in distributed systems, ML infrastructure, or high-performance serving
- Production-grade Python and ideally experience in C++, Rust, or Go
- Deep low-level performance expertise (CUDA, Triton, kernel optimisation, quantisation, memory and compute scheduling)
- Proven experience scaling inference workloads in both cluster and on-device environments
- System-level mindset with experience optimising hardware–software stacks
Shortlisted candidates will be contacted within 48 hours.