Research Engineer (Inference - Model Serving)

About

This is a well-funded frontier AI company building agentic systems that automate complex, multi-step work.

This role is about making LLMs respond quickly, cheaply, and reliably for real users – under real traffic, with real latency constraints, and without wasting GPU capacity.

You’d join the inference team focused on serving logic and systems optimisation. The question is not how to train the model. It is how to run it well in production.

What you’ll do

Build and improve model serving systems for production LLM workloads
Optimise latency, throughput, batching, and memory use across inference pipelines
Work on serving architecture, scheduling, caching, and request handling under load
Improve GPU efficiency through systems-level profiling and performance tuning
Partner closely with model and product teams to ship fast, reliable inference
Diagnose bottlenecks in live serving paths and turn them into concrete improvements
Strengthen observability and debugging around production inference performance

What you’ll need

Strong experience with LLM serving, ML systems, or high-performance inference
Good low-level understanding of GPU performance, memory behaviour, and serving trade-offs
Strong Python skills; C++, Rust, CUDA, or Triton would be useful
Experience working on latency-sensitive distributed systems in production
Ability to profile systems, isolate bottlenecks, and improve performance end-to-end
Comfort working close to research teams in a fast-moving, loosely specified environment

Optional Bonus

Experience with custom kernels, quantisation, or compiler-level optimisation
Background in model serving frameworks or large-scale inference platforms

Shortlisted candidates will be contacted within 48 hours.

Location London, Paris
Salary / Compensation Up to £180k + equity package
Sectors Agentic, Frontier AI / Foundation Models, GenAI
Skills LLM Serving, GPU Optimisation, CUDA/Triton, Distributed Systems, PyTorch, Inference Performance

Role Contact

Calvin Duffy

calvin@axiomasearch.com

Founding Engineer

Founding Engineer role at an early-stage VC-backed startup in Paris. You'll be building the infrastructure layer that allows AI to reason and act across entire enterprise systems.

Location
Paris
Type
Hybrid
Salary
€100k-150k + equity

Research Scientist (Robotics)

VC-backed robotics and AI lab building universal foundation models is hiring a Research Scientist to develop Vision-Language-Action models powering real-world robotic control.

Location
London, Paris, San Francisco
Type
On-site
Salary
Up to £220k + equity package

Software Engineer (Rust/TypeScript)

VC-backed GenAI startup building an AI-native platform for financial modelling, replacing legacy spreadsheet workflows. The team is small and engineering-led, focused on hiring top-tier talent while serving major enterprise customers.

Location
London
Type
On-site
Salary
£120k–£200k + equity

Didn't find the right role?

Send us your CV.

Upload Your CV Now