Lead Cloud Site Reliability Engineer, Leadership, (Azure or GCP), SLO's, SLO's, Automation
A leading financial Services client is seeking a strong technical leader to help drive and support a large group of SRE engineers across multiple locations. The role will be split 50/50 hands-on, team management. This is an Engineering role, not operations.
The role:
* Lead and mentor a team of up to 15 SREs, championing continuous improvement and engineering excellence.
* Partner with application teams as they migrate services to the Cloud.
* Work with Product Owners and Engineering Leads to balance feature delivery with system reliability, performance and health.
* Use observability tooling, performance metrics and SRE principles to proactively identify issues and reduce operational toil.
* Implement Incident and problem management practices, ensuring strong root cause analysis and reduced MTTF/MTTR.
* Champion SLOs, SLIs, error budgets and reliability‑first thinking.
* Influence platform direction and engineering standards to help shape resilient cloud services at scale.
Technical Skills required:
* Strong team management experience (day-to-day, mentoring/coaching)
* Strong cloud engineering background, ideally across Azure and GCP.
* Experience building or operating large‑scale, resilient cloud platforms.
* Deep understanding of observability tooling (metrics, logs, traces).
* Hands‑on experience with modern SRE practices:
* SLOs / SLIs
* Automation to reduce toil
* Production readiness and robust post‑mortems
* Solid understanding of GitHub pipelines and Terraform modules.
* Proven experience leading high‑performing engineering teams.
* Ability to communicate complex technical topics in a clear, accessible way.
* Comfortable working with diverse stakeholder groups.
#J-18808-Ljbffr