Job Description
I'm partnered with a major organisation that's going through a huge SRE modernisation, and they're growing a brand-new, engineering-focused SRE function across their cloud platforms. We're now looking for an experienced Senior Site Reliability Engineer to join the team.
This is real SRE work: reducing toil, building automation, improving system reliability and observability, and supporting large-scale cloud environments across Azure and GCP.
The Role
You'll be part of a unified SRE team supporting multiple cloud teams, working on:
* Reliability, performance and observability across Azure/GCP
* Automation to reduce repeat incidents, tickets, and manual processes
* Improving SLOs, SLIs, error budgets and platform health
* Building and maintaining Terraform modules, GitHub pipelines and IaC
* Supporting app teams as they migrate large workloads to cloud
* 1-in-4 on-call (enhanced pay)
What They're Looking For
* 5+ years experience as an SRE in large/complex environments
* Strong Azure and/or GCP capability
* Terraform + CI/CD experience (GitHub, IaC, scripting)
* Deep understanding of observability, data, logs and alerting
* Someone who wants to help shape a modern SRE culture - not just keep the lights on
Why It's a Great Move
* Massive modernisation programme
* Opportunity to influence tooling, processes and culture
* Multi-cloud exposure (Azure + GCP)
* Proper engineering autonomy
* Clear progression opportunities as the team scales
What they are offering:
* Hybrid working environment with a requirement to be in the office 2 days per week (Leeds, Halifax, Manchester, Bristol or Edinburgh).
* Enhanced benefits package which includes flexible cash sum, private medical, enhanced pension contribution, 28 days + bank holidays and more.
If you are interested in finding out more, please send across an updated version of your CV, clearing demonstrating your relevant experience!