Lead Site Reliability Engineer (Manchester, Leeds, Edinburgh, Bristol, Halifax)
About the opportunity
Head Resourcing is pleased to be working with one of the UK leading financial services organisations to join a large-scale organisation undergoing significant technology transformation, with cloud, automation and engineering excellence at the heart of its strategy. The environment is fast moving, highly collaborative and focused on building modern, secure and resilient platforms that support both customers and engineering teams.
We are looking for an experienced Lead Site Reliability Engineer to play a key role in improving reliability, observability and operational performance across Azure and GCP environments. This role offers the opportunity to lead a highly capable SRE function, shape engineering best practice, and influence how reliable cloud services are designed and operated at scale.
The role
As a Lead SRE, you will take ownership of reliability and operational excellence across a portfolio of cloud platforms and services. You will work closely with engineering, product and platform stakeholders to ensure systems are scalable, observable and resilient, while helping teams balance innovation with stability.
Key responsibilities will include:
* Leading and developing a team of Site Reliability Engineers, creating a culture centred on continuous improvement, technical quality and shared learning.
* Partnering with application and engineering teams to support cloud migration activity and improve operational readiness.
* Working with Product Owners and Engineering Leads to ensure reliability, performance and service health remain a priority alongside feature delivery.
* Using monitoring, observability and service data to identify risks early, improve platform performance and reduce repetitive operational work.
* Driving improvements in incident and problem management, with a strong emphasis on root cause analysis, learning and service resilience.
* Promoting SRE principles such as SLOs, SLIs, error budgets and proactive reliability management across aligned teams.
* Contributing to engineering standards, platform direction and best practice for resilient cloud-based services.
What we’re looking for
Technical expertise
* Strong background in cloud engineering, ideally with experience across both GCP and Azure.
* Experience designing, supporting or improving large-scale cloud platforms with a focus on resilience, availability and performance.
* Strong understanding of observability practices, including metrics, logs and distributed tracing, and the ability to use data to drive service improvements.
* Hands‑on knowledge of core Site Reliability Engineering practices, including:
* Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
* Automation to reduce manual effort and operational overhead
* Production readiness reviews and effective post‑incident learning
* Good understanding of GitHub pipelines and Terraform modules.
Leadership and collaboration
* Previous experience leading engineering or platform teams and building high‑performing, inclusive environments.
* Confidence working across complex stakeholder groups and translating technical concepts into clear, practical language.
* A collaborative approach, with the ability to work effectively across engineering, product and platform functions.
Approach and mindset
We are particularly interested in someone who:
* Is committed to building resilient, observable and customer‑focused platforms.
* Enjoys mentoring and coaching others while helping to shape engineering culture.
* Looks for opportunities to simplify operations, remove toil and increase automation.
* Is adaptable and open‑minded, choosing the right tools and approaches based on the problem at hand.
* Thrives in cross‑functional teams and enjoys working in a modern engineering environment.
* Brings curiosity, continuous learning and a strong focus on improvement.
* Values inclusion, psychological safety and diverse perspectives.
* Up to £106k depending on experience
* Pension
* Bonus
* 30days Annual Leave + Public
Hybrid
This role is 2 days a week onsite as a must (Manchester, Leeds, Edinburgh, Bristol, Halifax)
If this sounds like you we would like to hear from you!
#J-18808-Ljbffr