Free cookie consent management tool by TermsFeed Generator Research Engineer, Inference Serving | Frontier AI | Axioma Search
Image
Image
Bg

Research Engineer (Inference - Model Serving)

About

This is a well-funded frontier AI company building agentic systems that automate complex, multi-step work. 

This role is about making LLMs respond quickly, cheaply, and reliably for real users – under real traffic, with real latency constraints, and without wasting GPU capacity.

You’d join the inference team focused on serving logic and systems optimisation. The question is not how to train the model. It is how to run it well in production.

What you’ll do

  • Build and improve model serving systems for production LLM workloads
  • Optimise latency, throughput, batching, and memory use across inference pipelines
  • Work on serving architecture, scheduling, caching, and request handling under load
  • Improve GPU efficiency through systems-level profiling and performance tuning
  • Partner closely with model and product teams to ship fast, reliable inference
  • Diagnose bottlenecks in live serving paths and turn them into concrete improvements
  • Strengthen observability and debugging around production inference performance

What you’ll need

  • Strong experience with LLM serving, ML systems, or high-performance inference
  • Good low-level understanding of GPU performance, memory behaviour, and serving trade-offs
  • Strong Python skills; C++, Rust, CUDA, or Triton would be useful
  • Experience working on latency-sensitive distributed systems in production
  • Ability to profile systems, isolate bottlenecks, and improve performance end-to-end
  • Comfort working close to research teams in a fast-moving, loosely specified environment

Optional Bonus

  • Experience with custom kernels, quantisation, or compiler-level optimisation
  • Background in model serving frameworks or large-scale inference platforms

Shortlisted candidates will be contacted within 48 hours.

Back to job listings
  • Location London, Paris
  • Salary / Compensation Up to £180k + equity package
  • Sectors Agentic, Frontier AI / Foundation Models, GenAI
  • Skills LLM Serving, GPU Optimisation, CUDA/Triton, Distributed Systems, PyTorch, Inference Performance
Image

Role Contact

Calvin Duffy

Bg

Didn't find the right role?

Send us your CV.