About
Well-funded frontier AI startup building state-of-the-art agentic systems that automate complex, multi-step tasks. The Models team develops the core LLMs and vision-language models focused on instruction following, tool use, and reliable decision-making at controlled inference cost.
What you'll do
- Research post-training methods for large multimodal language models with focus on RL and feedback-driven learning
- Design reward models and large-scale reinforcement learning setups
- Build automated data collection pipelines using human and machine feedback
- Develop evaluations that capture real capability gains
- Translate product failures and use cases into improved training signals
What you'll need
- Strong research background with hands-on experience in LLM post-training, alignment, or reinforcement learning
- Proficiency in Python and at least one major deep learning framework (PyTorch, JAX, or TensorFlow)
- Experience training large models on distributed systems
- Publications at top-tier conferences (NeurIPS, ICML, ICLR, ACL, CVPR, etc.)
- Comfortable working in fast-moving research environments
Shortlisted candidates will be contacted within 48 hours.