Free cookie consent management tool by TermsFeed Generator Site Reliability Engineer, Agentic AI | Remote | Axioma Search
Image
Image
Bg

Staff Site Reliability Engineer

About

Enterprise operations are still held together by emails, spreadsheets, and systems that don’t talk to each other. The gap isn’t model capability — it’s turning messy, exception-heavy workflows into systems that actually run end-to-end.

You’d join a seed-stage team (~35 people, ~50% engineering) building an agentic platform already used by enterprise retail teams. The product runs live workflows across supplier portals, ERPs, inboxes, and internal tools — often without clean APIs or well-defined processes.

You’ll be the first dedicated SRE. Infrastructure exists, but ownership is fragmented. You take over a live system — Terraform modules, OpenTelemetry pipeline, Prometheus/Grafana — and build the reliability function from first principles.

What you’ll do

  • Own platform reliability across infrastructure, observability, and incident response
  • Design and improve systems for long-running, unpredictable workloads
  • Build monitoring, alerting, and tracing that surfaces issues before customers do
  • Lead incident response, root cause analysis, and long-term fixes
  • Strengthen security across multi-tenant environments, including isolation and encryption
  • Improve infrastructure automation using Terraform, CI/CD, and container orchestration
  • Optimise system performance, capacity planning, and cost trade-offs
  • Partner with engineers to improve reliability without slowing product velocity

What you’ll need

  • Strong experience operating distributed systems in production environments
  • Deep understanding of reliability trade-offs: latency, consistency, availability
  • Hands-on experience with cloud infrastructure (GCP preferred) and Kubernetes
  • Strong observability experience (Prometheus, Grafana, OpenTelemetry or similar)
  • Experience with infrastructure as code and automation (Terraform, CI/CD)
  • Security mindset — comfortable working with sensitive enterprise data and isolation models
  • High ownership — you build systems, not just maintain them

Optional Bonus

  • Experience with AI/ML systems or LLM-based applications
  • Exposure to sandboxed execution or multi-tenant platforms
  • Background in product engineering or full-stack systems

Shortlisted candidates will be contacted within 48 hours.

Back to job listings
  • Location Remote
  • Salary / Compensation Up to £160k + equity
  • Sectors Agentic, GenAI
  • Skills GCP, Kubernetes, Terraform, Observability, Distributed Systems, Security
Image

Role Contact

Bethany Sellar

Bg

Didn't find the right role?

Send us your CV.