Lead site reliability engineer

Chelmsford

Robert Half

Site reliability engineer

Posted: 27 March

Offer description

Site Reliability Engineer Lead (SRE) – AWS | Observability | Incident Management

Robert Half International (an S&P 500 global staffing provider) is supporting a global consulting firm in sourcing an interim SRE Lead for a major financial services engagement. This role will focus on improving platform reliability, stabilising production environments, and embedding best-in-class SRE practices across a complex, high-availability estate.

Assignment Details

* £500–£550 p/day via PAYE PLUS additional 12.07% daily holiday pay on top. (employer’s NI & tax deducted at source – unlike umbrella companies and no umbrella company admin fees)
* Initial 6 month contract
* Hybrid working – 2–3 days per week in the City of London
* Start date: c.2–4 week turnaround with anticipated start date with onboarding paperwork of w/c 01/05

Key Responsibilities

* Lead and improve incident management processes (detection, triage, escalation, resolution)
* Drive major incident response (P1/P0) and post-incident reviews (blameless postmortems)
* Define and implement SRE principles including SLIs, SLOs, SLAs and error budgets
* Build and enhance observability frameworks across metrics, logs and tracing
* Drive automation and reduction of manual toil across operational processes
* Implement runbooks, playbooks and operational readiness standards
* Work closely with engineering, platform and security teams to embed reliability into delivery
* Support the design of resilient, highly available systems (failover, DR, multi-region)

Key Skills & Experience

* Proven experience in SRE, DevOps or Platform Engineering roles within complex environments
* Strong hands-on experience with incident management and production support at scale
* Deep experience with observability tooling (e.g. Prometheus, Grafana, Datadog, ELK, OpenTelemetry)
* Solid experience with AWS cloud environments (EKS/ECS, Lambda, API Gateway, etc.)
* Experience with CI/CD pipelines, automation and Infrastructure as Code (Terraform, Ansible, etc.)
* Strong understanding of system reliability, performance and resilience engineering principles
* Experience working in regulated or high-availability environments (financial services preferred)

Nice to Have

* Experience with chaos engineering or resilience testing
* Exposure to AIOps or intelligent automation frameworks
* Experience transitioning or improving outsourced / offshore support models

All candidates will be required to complete standard screening checks including Right to Work, financial background checks and last 5 years referencing.

Apply

Create E-mail Alert

Save

Similar job

Site reliability engineer

Basildon

iXceed Solutions

Site reliability engineer

€65,000 a year

Similar job

Site reliability engineer (security cleared)

Basildon

Profile 29

Site reliability engineer

Similar job

Site reliability engineer (security cleared)

Chelmsford

Profile 29

Site reliability engineer