Staff Site Reliability Engineer

About

Enterprise operations are still held together by emails, spreadsheets, and systems that don’t talk to each other. The gap isn’t model capability — it’s turning messy, exception-heavy workflows into systems that actually run end-to-end.

You’d join a seed-stage team (~35 people, ~50% engineering) building an agentic platform already used by enterprise retail teams. The product runs live workflows across supplier portals, ERPs, inboxes, and internal tools — often without clean APIs or well-defined processes.

You’ll be the first dedicated SRE. Infrastructure exists, but ownership is fragmented. You take over a live system — Terraform modules, OpenTelemetry pipeline, Prometheus/Grafana — and build the reliability function from first principles.

What you’ll do

Own platform reliability across infrastructure, observability, and incident response
Design and improve systems for long-running, unpredictable workloads
Build monitoring, alerting, and tracing that surfaces issues before customers do
Lead incident response, root cause analysis, and long-term fixes
Strengthen security across multi-tenant environments, including isolation and encryption
Improve infrastructure automation using Terraform, CI/CD, and container orchestration
Optimise system performance, capacity planning, and cost trade-offs
Partner with engineers to improve reliability without slowing product velocity

What you’ll need

Strong experience operating distributed systems in production environments
Deep understanding of reliability trade-offs: latency, consistency, availability
Hands-on experience with cloud infrastructure (GCP preferred) and Kubernetes
Strong observability experience (Prometheus, Grafana, OpenTelemetry or similar)
Experience with infrastructure as code and automation (Terraform, CI/CD)
Security mindset — comfortable working with sensitive enterprise data and isolation models
High ownership — you build systems, not just maintain them

Optional Bonus

Experience with AI/ML systems or LLM-based applications
Exposure to sandboxed execution or multi-tenant platforms
Background in product engineering or full-stack systems

Shortlisted candidates will be contacted within 48 hours.

Location Remote
Salary / Compensation Up to £160k + equity
Sectors Agentic, GenAI
Skills GCP, Kubernetes, Terraform, Observability, Distributed Systems, Security

Role Contact

Bethany Sellar

beth@axiomasearch.com

VP Engineering

A frontier agentic AI company is hiring a VP Engineering to lead the transition from breakthrough technology to global platform — bridging frontier research and enterprise-grade deployment.

Location
London, Paris
Type
Hybrid
Salary
€300k base + equity

Research Engineer (Training Infrastructure)

Build the training stack behind large multimodal models used in agentic AI. This role sits close to research and focuses on distributed training, reliability, and performance at meaningful scale.

Location
London, Paris
Type
Hybrid
Salary
Up to £180k + equity package

Founding ML Engineer

Founding ML hire at an early-stage startup pre-training foundation models for time-series forecasting. Own the training infrastructure and model architecture from the ground up.

Location
Paris
Type
On-site
Salary
€125k + equity package

Didn't find the right role?

Send us your CV.

Upload Your CV Now