Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Site reliability engineer

London
Sphere Digital Recruitment
Site reliability engineer
Posted: 17h ago
Offer description

Job Description

Site Reliability Engineer

Contract - 12 months

Inside IR35

Hybrid working

£400-550 per day depending on experience

Job DescriptionMy client is looking for a skilled Senior Site Reliability Engineer to play a key role in improving the reliability, scalability, and operational performance of their production systems. This role works closely with product and engineering teams to enhance system reliability, architecture, deployment safety, and observability.

Role Summary

My client is seeking a Senior Site Reliability Engineer to join a centralized Technical Operations function, where you will lead reliability initiatives and support operations across a range of large-scale, customer-facing digital services.

Operating within a centralized SRE model, you will partner with product and engineering teams while maintaining shared responsibility for production reliability, resilience, and scalability. The role includes participation in an on-call rotation supporting critical services, with shared ownership of overall system health.

You will be responsible for defining reliability standards, influencing architectural improvements, managing complex incidents, and building automation to improve deployment safety and operational efficiency. Your work will directly support high-traffic systems used by a global audience.

Key Responsibilities

Reliability & Risk Engineering

My client is looking for someone who can:

* Identify systemic reliability risks and drive long-term preventative improvements
* Define and refine SLIs, SLOs, and error budgets aligned with business and customer outcomes
* Lead complex incident management, post-incident reviews, and remediation planning
* Depth at Networkign Fundamentals - trouble shoting network infrastructure is key
* Experiecne working as senrio SRE particularly around AWS

Architecture & Resilience

You will:

* Review and influence system architecture to improve scalability, availability, and fault isolation
* Design strategies for high availability, graceful degradation, and disaster recovery
* Evaluate trade-offs between performance, cost, and operational risk

CI/CD & Deployment Safety

The successful candidate will:

* Improve deployment pipelines and implement automation to reduce risk and accelerate delivery
* Implement safe deployment strategies such as canary releases and blue/green deployments
* Ensure strong rollback and recovery mechanisms

Observability & Performance

You will be expected to:

* Build and enhance observability solutions including metrics, logging, and tracing
* Work with teams to reduce alert fatigue and improve signal quality
* Diagnose performance bottlenecks across infrastructure and applications

Infrastructure & Automation

My client is seeking someone who can:

* Design and operate cloud-native, containerised workloads at scale
* Use Infrastructure as Code to build and manage resilient platforms
* Develop automation to reduce manual effort and operational risk

Cross-Functional Leadership

You will:

* Mentor engineers and promote SRE best practices across teams
* Collaborate with engineering, product, and security stakeholders to improve system reliability

Required Qualifications

My client is looking for candidates with:

* A degree in Computer Science, Engineering, or equivalent practical experience
* Strong experience designing and operating CI/CD systems with deployment safety practices
* Excellent communication skills with the ability to influence cross-functional teams
* 7+ years of experience in SRE, production engineering, or systems engineering roles
* Strong knowledge of distributed systems concepts, including consistency and failure handling
* Hands-on experience with major cloud platforms (e.g., AWS, GCP, Azure), including multi-region environments
* Strong experience with Kubernetes and container orchestration at scale
* Proficiency in at least one programming language such as Go, Python, or Java
* Proven experience managing high-severity incidents and leading remediation efforts

Preferred Qualifications

Ideally, candidates will also have:

* Experience with multi-region or multi-cloud architectures
* Familiarity with observability tools such as Prometheus, Grafana, or Datadog
* Previous mentoring or technical leadership experience
* Experience with Infrastructure as Code tools such as Terraform or CloudFormation
* Exposure to AI-assisted tooling for incident analysis or operational efficiency

Sphere Digital Recruitment is acting as an Employment Business in relation to this vacancy.

Apply
Create E-mail Alert
Job alert activated
Saved
Save
Similar job
Site reliability engineer (sre)
London
UA Consulting
Site reliability engineer
£75,000 a year
Similar job
Senior site reliability engineer
London
Site reliability engineer
£700 a month
Similar job
Site reliability engineer - government digital service - g7
London
Manchester Digital
Site reliability engineer
€67,000 a year
See more jobs
Similar jobs
Sphere Digital Recruitment recruitment
Sphere Digital Recruitment jobs in London
Engineering jobs in London
jobs London
jobs Greater London
jobs England
Home > Jobs > Engineering jobs > Site reliability engineer jobs > Site reliability engineer jobs in London > Site Reliability Engineer

About Jobijoba

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by Job Title
  • Jobs by Industry
  • Jobs by Company
  • Jobs by Location
  • Jobs by Keywords

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2026 Jobijoba - All Rights Reserved

Apply
Create E-mail Alert
Job alert activated
Saved
Save