Product owner - operational resilience

Sheffield

Teksystems

Posted: 8 June

Offer description

Description

Own and evolve a Proactive Resilience product/capability that anticipates, prevents, and mitigates technology and service disruption. You’ll translate resilience outcomes (availability, recoverability, performance, operational readiness) into a clear product roadmap, measurable value, and repeatable adoption across platforms and teams.

Key responsibilities include:

Product strategy & roadmap

- Define product vision, target users and a prioritised roadmap aligned to business services.

- Maintain a clear backlog of resilience features Outcome-driven delivery

- Set OKRs/KPIs for proactive resilience.

- Maintain a Community of Practice to surface potential resilience improvements, maintained and prioritised via a backlog

Resilience-by-design

- Embed resilience enhancements into SDLC and change processes (non-functional requirements, release readiness, operational acceptance).

- Champion practices such as chaos engineering, game days, fault injection, capacity and performance testing, and DR readiness.

Observability & insights

- Partner with monitoring/observability teams to improve telemetry, alert quality, and actionable dashboards.

- Use data to identify systemic risks, recurring failure modes, and “top offenders” across services.

Automation & operational excellence

- Prioritise automation for detection, triage, and remediation.

Stakeholder management

- Align engineering, operations, architecture, risk, and business stakeholders on resilience priorities.

- Communicate progress and risk clearly to snr leadership; manage dependencies and delivery risks.

Governance & controls

- Ensure the product supports relevant operational resilience expectations (, impact tolerances, testing evidence, auditability).

- Maintain documentation, controls evidence, and reporting suitable for risk and assurance audiences.

Required xp & skills

Product ownership/management xp in platform, SRE or operational resilience domains.

Strong understanding of:

- Operational Resilience

- SRE principles (SLO/SLI), incident/problem management, and service management.

- Resilience patterns (redundancy, graceful degradation).

- DR/BCP concepts (RTO/RPO), high availability, and dependency management.

Data-driven decision-making: ability to use incident, change, and telemetry data to prioritise.

Agile delivery expertise (Scrum/Kanban), backlog management, and stakeholder communication.

Desirable

Familiarity with resilience patterns and platform engineering.

xp running game days/chaos experiments and translating findings into engineering work.

Financial services xp and comfort working with risk, compliance, and audit partners.

Skills

1. Product Ownership
2. Product Management
3. Operational Resilience
4. Technology
5. Disaster Recovery
6. Resilience
7. Proactive Resilience
8. Product Roadmapping
9. SRE Principles
10. SLO
11. SLI
12. Incident management
13. problem management
14. service management
15. DR
16. BCP
17. RTO
18. RPO
19. Dependency Management

Job Title: Product Owner - Operational Resilience

Location: Sheffield, UK

Job Type: Contract

Trading as TEKsystems. Allegis Group Limited, Maxis 2, Western Road, Bracknell, RG12 1RT, United Kingdom. No. 2876353. Allegis Group Limited operates as an Employment Business and Employment Agency as set out in the Conduct of Employment Agencies and Employment Businesses Regulations 2003. TEKsystems is a company within the Allegis Group network of companies (collectively referred to as "Allegis Group"). Aerotek, Aston Carter, EASi, Talentis Solutions, TEKsystems, Stamford Consultants and The Stamford Group are Allegis Group brands.

Apply

Create E-mail Alert

Save

Similar job

Full stack java developer

Sheffield

Teksystems

Java developer

€375,000 a year

Similar job

Programme manager

Sheffield

Teksystems

Programme manager

€550 a month

Similar job

Data strategy programme lead - uk banking

Sheffield

Teksystems

Banking

€550 a month