Operational Resilience & Incident Manager
Location: London, England, United Kingdom. Posted 2 weeks ago.
About the team
This role sits within our Technology Operations & Risk Team, which is responsible for IT Service Management as well as 1LOD Technology Risk and Business Change. Our goal is to help the technology organisation and wider business achieve its commercial objectives through balanced risk management, effective change delivery and resilience through timely response and resolution of incidents. As an enablement function for the business, we are continuously looking at ways to simplify and strengthen our processes through automation and AI.
A Day In The Life
* Take ownership for the end‑to‑end incident management lifecycle for our technology services - ensuring rapid detection, response, triage, remediation and post‑incident reviews, aligned to business needs and SLAs.
* Lead the major incident / technical incident response process: ensure correct escalation, coordinate multi‑discipline teams (infrastructure, applications, security, operations etc), communicate status to stakeholders, restore service with minimal business impact, and ensure root‑cause and corrective/preventive actions.
* Oversee the tooling ecosystem for incident management: ensure that incident management platforms (ticketing, alerting, monitoring dashboards, runbooks) are configured, maintained and optimised for efficiency, automation and metrics.
* Monitor service performance and incident metrics (mean time to detect, mean time to resolve, recurrence rates, SLA compliance) and present reports to tech teams and senior leadership, using data to drive continual service improvement.
* Partner with the security operations / cyber incident team to manage security‑related incidents (e.g., logging/alerting, log‑analysis, triage of suspicious activity, coordinating with SOC, forensic hand‑off, lessons‑learned).
* Own and evolve processes under recognised service‑management frameworks (for example ITIL / ISO/IEC 20000) ensuring they reflect Zopa Bank’s environment, governance requirements and regulatory obligations.
* Work closely with other IT and business stakeholders (operations, risk, compliance, development, infrastructure) to ensure service delivery is aligned to business outcomes, risks are managed and continuous improvement is embedded.
* Train and mentor colleagues across the business, ensuring they have the right skills, tooling access and processes to deliver excellent service and response to incidents.
About You
* Hands‑on experience in IT service delivery/operations, with specific responsibility for incident management in a complex, high‑availability environment.
* Proven track record of technical incident management: coordinating major incidents, triaging complex issues across infrastructure, applications, networks, coordinating cross‑team responses and driving resolution.
* Strong experience with incident management tooling and platforms (for example ticket/alerting systems such as Jira Service Management, PagerDuty, Splunk, monitoring/observability tools, log‑analysis tools) and the ability to optimise workflows and automate where appropriate.
* Certifications in relevant frameworks such as ITIL (Foundation or preferably Intermediate/Expert) and familiarity with service‑management standards such as ISO/IEC 20000.
* Experience or strong awareness of security incident management practices: log analysis, triage of security alerts, coordination with SOC teams, performing root‑cause/root‑cause analysis for security events.
* Excellent stakeholder management and communication skills: able to translate technical incident status into business‑impact language, manage expectations, and keep leadership informed.
* Strong analytical and problem‑solving skills: ability to review incident metrics, spot trends, hypothesise root causes, and initiate improvements.
* Leadership skills: able to lead through incident pressure, guide teams, ensure calm in high‑stress incident responses, and drive incident review and lessons‑learned into service improvements.
* A proactive mindset for continual improvement: you identify and drive enhancements to processes, tooling, service‑levels, and resilience across the IT service management domain.
* Ideally, experience working in a regulated environment (such as a bank/financial services) with an understanding of risk, compliance, audit and service‑governance implications of incident management.
Added Bonus
* Experience with observability, monitoring and alerting platforms (for example Splunk, Prometheus, Grafana) and hands‑on log‑analysis skills (e.g., analysing event logs, application logs, network logs to support incident response or security investigations).
* Certification or working knowledge in security or cyber incident response (for example CISSP, CISM, or incident response training).
* Experience in designing and delivering major‑incident simulation/drill programmes or running incident simulation exercises (table‑top drills) to test readiness.
* Experience in managing / migrating incident management tools and platforms.
* Experience working in a Fintech or banking environment with real‑time services, where high availability, change‑control, risk and regulatory constraints are part of the daily operating model.
* Knowledge of DevOps/CI/CD and how incident management integrates with deployment pipelines, monitoring, post‑release health checks and service resilience.
* Experience managing external vendors/third‑party service providers in the incident response context (escalation, service‑contracts, SLAs, hand‑off).
* Familiarity with cloud‑native incident monitoring and response (e.g., AWS CloudWatch) and hybrid infrastructure environments.
At Zopa we value flexible ways of working
This hybrid role requires you to come to our London office 2‑3 days a week. You'll also have the option of working from abroad for up to 120 days a year! But no matter where you are, we’ll make sure you’ve got everything you need to thrive, both in your work and home life, from day one.
Diversity Statement
Zopa is proud to offer a workplace free from discrimination. Diversity of experience, perspectives and backgrounds leads to better products for our customers and a unique company culture for our people. We are made up of nearly 50 nationalities, have a DE&I forum made up of Zopians wanting to make a difference, and are proud of our culture where everyone can bring their full self to work. Our approach to DE&I is reflected in our hiring process so please let us know if you require any reasonable adjustments.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analysing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgement. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Seniority level
Not Applicable
Employment type
Full‑time
Job function
Information Technology
Industries
Financial Services, IT Services and IT Consulting, and Banking
#J-18808-Ljbffr