Research Engineer (Training Infrastructure)

About

This is a well-funded frontier AI company building agentic systems that automate complex, multi-step work. The models team sits close to product, training the foundation models behind those systems – with a focus on instruction following, tool use, multimodal understanding, and reinforcement learning.

The research is ambitious. The bottleneck is training infrastructure. This role exists to make large-scale LLM and VLM training faster, more reliable, and less painful as model size and system complexity increase.

What you’ll do

Build and improve training infrastructure for large-scale LLMs and VLMs
Work on distributed training systems using tools such as FSDP, DeepSpeed, Megatron, TorchTitan or similar
Optimise training performance, reliability, and GPU utilisation across large clusters
Support multimodal model training, including data flow, checkpointing, and experiment workflows
Partner closely with researchers to turn new training ideas into production-grade systems
Debug hard failures in long-running jobs and improve observability across the stack
Contribute to evaluation workflows and help teams ship model improvements faster

What you’ll need

Strong Python engineering skills
Hands-on experience with training infrastructure for large models (100B+)
Good understanding of distributed training libraries such as FSDP, Megatron, TorchTitan, or DeepSpeed
Experience supporting or training LLMs (or VLMs) at scale
Familiarity with at least one major deep learning framework such as PyTorch, JAX, or TensorFlow
Comfort working closely with researchers in ambiguous, fast-moving environments
Strong communication skills and a low-ego, collaborative way of working

Shortlisted candidates will be contacted within 48 hours.

Location London, Paris
Salary / Compensation Up to £180k + equity package
Sectors Agentic, Frontier AI / Foundation Models, GenAI
Skills Python, PyTorch, Distributed Training, FSDP/DeepSpeed/Megatron, LLM/VLM Training, Training Infrastructure

Role Contact

Matthieu Derycke

matt@axiomasearch.com

Founding Engineer

Founding Engineer role at an early-stage VC-backed startup in Paris. You'll be building the infrastructure layer that allows AI to reason and act across entire enterprise systems.

Location
Paris
Type
Hybrid
Salary
€100k-150k + equity

Research Scientist (Robotics)

VC-backed robotics and AI lab building universal foundation models is hiring a Research Scientist to develop Vision-Language-Action models powering real-world robotic control.

Location
London, Paris, San Francisco
Type
On-site
Salary
Up to £220k + equity package

Software Engineer (Rust/TypeScript)

VC-backed GenAI startup building an AI-native platform for financial modelling, replacing legacy spreadsheet workflows. The team is small and engineering-led, focused on hiring top-tier talent while serving major enterprise customers.

Location
London
Type
On-site
Salary
£120k–£200k + equity

Didn't find the right role?

Send us your CV.

Upload Your CV Now