Lead site reliability engineer

Slough

Holland & Barrett

Site reliability engineer

Posted: 11h ago

Offer description

Own Reliability. Shape the Platform. Empower Millions.

At Holland & Barrett, we're transforming into a truly product- and platform-led technology organisation — and we're looking for a Lead Site Reliability Engineer who's excited by scale, complexity, and impact.

Our mission? Build and evolve the resilient, high-performance systems that power health and wellness for millions of customers. If you're obsessed with reliability, driven by automation, and thrive in high-ownership engineering cultures, this is your opportunity to lead from the front.

What You'll Lead & Deliver

Reliability & Performance at Scale

* Architect and improve cloud-native systems with reliability as a first-class principle.
* Shape SLIs/SLOs, error budgets, capacity planning, and performance strategies.
* Continuously evolve availability, efficiency, and resilience across our platforms.

Technical Leadership That Raises the Bar

* Mentor SREs, platform engineers, and developers across the organisation.
* Champion automation, observability, DevSecOps, and modern operational practices.
* Influence engineering culture and architectural direction.

Operational Excellence

* Own and lead high-severity incident response with calm, clarity, and technical depth.
* Run world-class post-incident reviews and drive meaningful, measurable improvements.
* Strengthen monitoring, alerting, on-call practices, and reliability processes.
* Support resilience validation through load testing, stress testing, and chaos engineering.

Automation, Tooling & Engineering Efficiency

* Build tools and automation that remove toil and accelerate teams.
* Develop CI/CD pipelines and Infrastructure-as-Code environments.
* Drive consistency, repeatability, and self-service across engineering.

Cross-Team Collaboration

* Partner with Security, Platform, and Engineering teams to align reliability with security and resilience goals.
* Lead teams toward better design, operational readiness, and measurable service health.
* Contribute to documentation, runbooks, and operational processes that scale.

Key requirements:

* 5–8+ years in SRE, Platform, Cloud Infrastructure, or operational engineering roles.
* Hands-on experience architecting and improving large-scale, distributed systems.
* Strong coding proficiency in Python, Go, Bash, or similar automation-focused languages.
* Expertise with observability stacks: Datadog, Prometheus, Grafana, OpenTelemetry.
* Deep AWS experience across EC2, EKS, Lambda, VPC, DynamoDB, S3, CloudFront, RDS, IAM, KMS, and more.
* Proficiency with Terraform, CloudFormation, or AWS CDK.
* Incident response leadership and root-cause analysis expertise.
* Excellent documentation and communication skills.
* Strong analytical and troubleshooting abilities.

Bonus

* Experience mentoring or leading engineers within SRE or platform teams.
* Experience with load testing, stress testing, and chaos engineering.
* A passion for uplifting engineering culture through tooling, automation, and reliability-first thinking.

Why Build the Future with Holland & Barrett?

Technology is at the heart of our mission to make health & wellness accessible to everyone. As a Lead SRE, you won't just keep systems running — you'll design the reliability, resilience, and operational maturity that accelerates our entire business.

We offer:

* A modern engineering culture built on autonomy, experimentation, and learning.
* The chance to create real impact across critical customer and internal platforms.
* A collaborative team that values innovation, continuous improvement, and technical excellence.

If you're ready to lead reliability for platforms with massive real-world impact, we'd love to meet you.

Apply now and help shape the future of H&B Technology.

Apply

Create E-mail Alert

Save

Similar job

Lead site reliability engineer | copperleaf

Staines

Permanent

IFS

Site reliability engineer

€80,000 a year

Similar job

Site reliability engineer

Pirbright

Permanent

Consult

Site reliability engineer

€65,000 a year

Similar job

Network site reliability engineer

Reading (Berkshire)

Permanent

NVIDIA Corporation

Site reliability engineer

€80,000 a year