 
        
        Lead Site Reliability Engineer | Fully Remote | AWS, Kubernetes, Terraform | High-Scale SaaS | £90K
Company Overview
High-growth SaaS platform operating at national scale, powering critical services for thousands of customers daily. Engineering-led, cloud-native, and focused on delivering highly reliable, scalable distributed systems. This will be a senior leadership role driving SRE strategy, platform scalability, and operational excellence. You’ll own reliability, performance and automation across multiple engineering teams, evolving the platform to handle rapid growth.
Key Responsibilities & Experience
 * Define and scale SRE practices across product teams.
 * Ideally from a technical / SWE background
 * Own system design for reliability, scalability and performance.
 * Lead platform reliability, availability and incident management.
 * Drive automation, IaC, observability and continuous improvement.
 * Guide root cause analysis and implement resilience strategies.
 * Mentor and technically lead SRE / Platform engineers.
 * Support large-scale re-architecture, capacity planning and FinOps alignment.
Core Technical Environment
 * Cloud: AWS (high-throughput systems: 1,000–6,000+ req/sec)
 * IaC: Terraform, configuration management
 * Containers: Kubernetes, Docker (ECS beneficial)
 * Languages: Python, Go or similar
 * Observability: Prometheus, DataDog or equivalents
 * CI/CD: Modern automated pipelines
 * Systems: Distributed systems, microservices, resilience engineering
Lead Site Reliability Engineer | Fully Remote | AWS, Kubernetes, Terraform | High-Scale SaaS | £90K