A Global Banking & Payments Tier 1 Bank are looking for a Senior Site Reliability engineer to lead and accelerate their transformation from traditional L2 production support toward an SRE operating model.
This role will operate as a Inside IR35 Contract with a current length of 12 Months but is likely to extend. The Office Location is Bromley, South London and you will be required in Office 3 days a Week.
This role will help define, implement, and embed SRE practices across critical payment and banking services, enabling measurable reliability outcomes, reduced manual toil, stronger automation, and improved service visibility.
The successful candidate will bring proven, hands‑on experience implementing SRE in a large corporate bank and will be able to influence across operations, engineering, and product partners to institutionalize SRE practices at scale.
What You Will Do (Key Responsibilities)
SRE Operating Model and Transformation
* Lead the design and execution of the SRE adoption approach across Global Banking & Payments, including the transition path from traditional L2 support to reliability engineering.
* Establish practical engagement patterns between SRE, application teams, and platform teams and help teams adopt a consistent way of working.
Reliability Measurement and Decisioning
* Drive adoption of Critical User Journeys, Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets for priority services, ensuring metrics reflect user experience and business outcomes.
* Help teams implement error‑budget based decisioning that balances reliability, delivery velocity, and operational risk.
Toil Reduction, Automation, and Engineering Excellence
* Identify operational toil and lead initiatives to eliminate it through automation, self‑healing patterns, runbook automation, and operational tooling improvements.
* Establish and implement a model to partner with engineering teams to build reliability into services through design improvements, improved instrumentation, and resilience patterns.
Incident and Problem Management Excellence
* Improve production outcomes through strong incident response practices, including major incident triage support, root cause analysis, post‑incident reviews, and preventive engineering actions.
* Strengthen problem management with a focus on reducing repeat incidents, technical debt risk, and manual intervention.
Observability and Tooling Enablement
* Establish practical observability standards across logs, metrics, traces, dashboards, and alerting to reduce noise, improve signal quality, and shorten time to detect and restore service.
* Partner across platform, tooling, and service management teams to align SRE needs to enterprise tooling and processes.
* Work with tools like Splunk, Dynatrace, OTEL and instrument end‑to‑end observability for services, ensuring teams are able to adopt and use the platforms.
Stakeholder Management and Change Leadership
* Influence leaders across operations, engineering, and product to adopt SRE principles and measurable reliability goals.
* Communicate clearly with senior stakeholders, including executive updates on progress, adoption, and outcomes.
Required Qualifications:
* Significant experience in Site Reliability Engineering and implementing SRE practices across large scale, complex services.
* Demonstrated experience leading an SRE transformation in a corporate banking environment (or similarly regulated financial services enterprise).
* Proven ability to implement and scale SLO/SLI and error‑budget approaches, and to operationalize them across multiple teams and services.
* Strong engineering background with the ability to drive automation and reduce manual toil through code, tooling, and process redesign.
* Deep knowledge of incident response, problem management, root cause analysis, and operational resilience practices in mission critical environments.
* Strong stakeholder management skills, able to influence across technology and business partners and communicate effectively at senior levels.
Preferred Qualifications:
* Experience supporting payments, cash management, or other high‑availability banking platforms with 24x7 operational expectations.
* Experience designing observability approaches and improving alert quality across large portfolios.
* Experience building SRE communities of practice, training pathways, or structured enablement programs across a global organization.
* Familiarity with enterprise service management tooling and production governance in large banks.
#J-18808-Ljbffr