Lead Site Reliability Engineer
Central London (Hybrid)
Up to £95k + Car Allowance & Bonus
TRIA are working with a leading hospitality client for a Lead SRE, where they are investing heavily in the performance, stability, and reliability of its digital platforms.
This is a hands-on leadership role - you won’t just guide others, you’ll be the go-to expert when systems are under pressure. You'll lead incident response, own root cause analysis, and solve performance issues like memory leaks, outages, and flaky services.
Your focus will include:
* Leading incident management, post-mortems, and blameless RCAs
* Building scalable, resilient microservices with the dev teams
* Uplifting observability
* Improving alerting, monitoring, and system-level metrics
* Driving better SLOs, SLIs, and overall uptime
The stack includes Kubernetes, Terraform, AWS, Python, and modern CI/CD tools, and it's evolving.
If you're confident in a crisis, understand what a good SRE practice looks like, and want to leave systems in a better place than you found them, please apply to be considered and learn more!
What you’ll bring:
* Experience in high-traffic digital or eCommerce platforms
* 5+ years in SRE/DevOps roles; strong background in incident response
* Observability, automation, and infrastructure as code expertise
* Leadership skills - mentoring others or leading from the front