Description
Own and evolve a Proactive Resilience product/capability that anticipates, prevents, and mitigates technology and service disruption. You’ll translate resilience outcomes (availability, recoverability, performance, operational readiness) into a clear product roadmap, measurable value, and repeatable adoption across platforms and teams.
Key responsibilities
* Product strategy & roadmap
o Define product vision, target users and a prioritised roadmap aligned to business services.
o Maintain a clear backlog of resilience features
* Outcome-driven delivery
o Set OKRs/KPIs for proactive resilience.
o Maintain a Community of Practice to surface potential resilience improvements, maintained and prioritised via a backlog
* Resilience-by-design
o Embed resilience enhancements into SDLC and change processes (non-functional requirements, release readiness, operational acceptance).
o Champion practices such as chaos engineering, game days, fault injection, capacity and performance testing, and DR readiness.
* Observability & insights
o Partner with monitoring/observability teams to improve telemetry, alert quality, and actionable dashboards.
o Use data to identify systemic risks, recurring failure modes, and top offenders across services.
* Automation & operational excellence
o Prioritise automation for detection, triage, and remediation.
* Stakeholder management
o Align engineering, operations, architecture, risk, and business stakeholders on resilience priorities.
o Communicate progress and risk clearly to senior leadership; manage dependencies and delivery risks.
* Governance & controls
o Ensure the product supports relevant operational resilience expectations (eg, impact tolerances, testing evidence, auditability).
o Maintain documentation, controls evidence, and reporting suitable for risk and assurance audiences.
Required xp & skills
Product ownership/management xp in platform, SRE or operational resilience domains.
Strong understanding of
* Operational Resilience
* SRE principles (SLO/SLI), incident/problem management, and service management.
* Resilience patterns (redundancy, graceful degradation).
* DR/BCP concepts (RTO/RPO), high availability, and dependency management.
Data-driven decision-making: ability to use incident, change, and telemetry data to prioritise.
Agile delivery expertise (Scrum/Kanban), backlog management, and stakeholder communication.
Desirable
* Familiarity with resilience patterns and platform engineering.
* xp running game days/chaos experiments and translating findings into engineering work.
* Financial services xp and comfort working with risk, compliance, and audit partners.
Skills
* Product Ownership
* Product Management
* Operational Resilience
* Technology
* Disaster Recovery
* Resilience
* Proactive Resilience
* Product Roadmapping
* SRE Principles
* SLO
* SLI
* Incident management
* problem management
* service management
* DR
* BCP
* RTO
* RPO
* Dependency Management
Job Title
Product Owner - Operational Resilience
Location
Sheffield, UK
Job Type
Contract
#J-18808-Ljbffr