Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Service reliability

Sheffield
Permanent
ManpowerGroup
Service
Posted: 7h ago
Offer description

Overview

Job Title: Senior Recovery Lead and Global Head of Service Reliability

Location: Sheffield (Hybrid)

6 month Contract

Service Management (SM)

Service Management’s purpose is to protect the availability, integrity and confidentiality of IT Services that underpin customer and colleagues experience of the brand. It is a multi-functional team comprising of Change Management, Incident Management, Problem Management, Service Level Management, Outage Management, Service Recovery and Service Insights and Reporting.

About the Role

We are seeking a senior technology leader to take on the dual role of Senior Recovery Lead and Global Head of Service Reliability. This is a highly visible, high-impact position reporting to the Global Head of Service Management, with a mandate to transform how we recover from incidents and build long-term service resilience.

This individual will lead a global team of technical experts who act as technical escalation partners during major incidents—helping reduce time to recover (TTR) through deep technical engagement, coordination, and engineering-driven solutions. Beyond recovery, this leader will also own the strategic and tactical roadmap for building reliable, self-healing systems through collaboration with Problem Management, SRE, and Platform teams.


Key Responsibilities

* Lead a global, follow-the-sun team that acts as technical escalation partners during major incidents.
* Partner with Incident Managers and Service Owners to accelerate incident diagnosis and resolution, reducing TTR and restoring services quickly and safely.
* Bring calm, coordination, and engineering clarity to high-pressure recovery efforts.
* Collaborate with Problem Managers, Product SRE, and Platform Engineering teams to identify and eliminate systemic causes of major incidents.
* Own and drive long-term remediation plans, including automation, reliability engineering, and platform guardrails to reduce future risk.
* Track and govern follow-up actions to ensure completeness, accountability, and measurable reduction in incident recurrence.


Service Reliability Engineering Strategy

* Define and implement strategies for resilience engineering, including self-healing capabilities, automation of recovery workflows, and risk mitigation patterns.
* Advocate for operational excellence by embedding reliability standards, testing practices, and continuous improvement processes into engineering workflows.
* Partner with Architecture and Engineering leaders to influence system design with reliability in mind.
* Own the global incident scenario planning framework, ensuring that Technology is prepared to recover from widespread, complex failures.
* Design and run mass recovery simulations, chaos testing, and resilience drills to expose weaknesses and improve readiness.
* Work with regional and global risk teams to align with regulatory and operational resilience requirements.


Leadership, Influence & Culture

* Build, scale, and lead a high-performing global team with deep technical skills and a culture of urgency, ownership, and collaboration.
* Drive a blameless, learning-focused culture that emphasizes root cause thinking, accountability, and continuous improvement.
* Act as a trusted partner and thought leader across Engineering, Infrastructure, Risk, and Service Management functions.


Qualifications & Experience

* 12+ years in Technology, with proven experience in Site Reliability Engineering, Infrastructure, DevOps, or Technical Operations.
* Demonstrated experience leading global technical teams in complex, high-scale environments.
* Deep expertise in incident recovery, automation, systems design, and platform reliability.
* Strong working knowledge of problem management, root cause analysis frameworks, and resilience engineering principles.
* Experience designing and running resilience exercises, chaos engineering, or incident scenario testing at scale.
* Comfortable operating in regulated environments and partnering with Risk and Compliance functions.
* Excellent stakeholder management and communication skills, with the ability to lead through influence at senior levels.


Core Competencies

* Technical Depth – Ability to dive deep across infrastructure, applications, and cloud-native architectures.
* Recovery Leadership – Skilled in coordinating technical resources under pressure to resolve incidents rapidly.
* Reliability Thinking – Strategic mindset focused on system robustness, automation, and prevention.
* Change Agent – Drives cultural and engineering change to improve stability and accountability.
* Cross-Functional Collaboration – Adept at aligning goals and actions across engineering, operations, and risk domains.
#J-18808-Ljbffr

Apply
Create E-mail Alert
Job alert activated
Saved
Save
Similar job
Customer service representative agent work from home - remote online panelists
Chesterfield
Permanent
Temporary
Apex Focus
Customer service representative
£250 - £700 a week
Similar job
Call centre representative agent work at home - part time paid panelists
Sheffield
Permanent
Temporary
Apex Focus
Call centre representative
£250 - £700 a week
Similar job
Call centre representative agent work at home - part time paid panelists
Doncaster
Permanent
Temporary
Apex Focus
Call centre representative
£250 - £700 a week
See more jobs
Similar jobs
Service jobs in Sheffield
jobs Sheffield
jobs South Yorkshire
jobs England
Home > Jobs > Service jobs > Service jobs > Service jobs in Sheffield > Service Reliability

About Jobijoba

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by Job Title
  • Jobs by Industry
  • Jobs by Company
  • Jobs by Location
  • Jobs by Keywords

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2026 Jobijoba - All Rights Reserved

Apply
Create E-mail Alert
Job alert activated
Saved
Save