Site Reliability Engineer Lead (SRE) - AWS | Observability | Incident Management
Robert Half International (an S&P 500 global staffing provider) is supporting a global consulting firm in sourcing an interim SRE Lead for a major financial services engagement. This role will focus on improving platform reliability, stabilising production environments, and embedding best-in-class SRE practices across a complex, high-availability estate.
Assignment Details
1. £500-£550 p/day via PAYE PLUS additional 12.07% daily holiday pay on top. (employer's NI & tax deducted at source - unlike umbrella companies and no umbrella company admin fees)
2. Initial 6 month contract
3. Hybrid working - 2-3 days per week in the City of London
4. Start date: c.2-4 week turnaround with anticipated start date with onboarding paperwork of w/c 01/05
Key Responsibilities
5. Lead and improve incident management processes (detection, triage, escalation, resolution)
6. Drive major incident response (P1/P0) and post-incident reviews (blameless postmortems)
7. Define and implement SRE principles including SLIs, SLOs, SLAs and error budgets
8. Build and enhance observability frameworks across metrics, logs and tracing
9. Drive automation and reduction of manual toil across operational processes
10. Implement runbooks, playbooks and operational readiness standards
11. Work closely with engineering, platform and security teams to embed reliability into delivery
12. Support the design of resilient, highly available systems (failover, DR, multi-region)
Key Skills & Experience
13. Proven experience in SRE, DevOps or Platform Engineering roles within complex environments
14. Strong hands-on experience with incident management and production support at scale
15. Deep experience with observability tooling (e.g. Prometheus, Grafana, Datadog, ELK, OpenTelemetry)
16. Solid experience with AWS cloud environments (EKS/ECS, Lambda, API Gateway, etc.)
17. Experience with CI/CD pipelines, automation and Infrastructure as Code (Terraform, Ansible, etc.)
18. Strong understanding of system reliability, performance and resilience engineering principles
19. Experience working in regulated or high-availability environments (financial services preferred)
Nice to Have
20. Experience with chaos engineering or resilience testing
21. Exposure to AIOps or intelligent automation frameworks
22. Experience transitioning or improving outsourced / offshore support models
All candidates will be required to complete standard screening checks including Right to Work, financial background checks and last 5 years referencing.
Robert Half Ltd acts as an employment business for temporary positions and an employment agency for permanent positions. Robert Half is committed to diversity, equity and inclusion. Suitable candidates with equivalent qualifications and more or less experience can apply. Rates of pay and salary ranges are dependent upon your experience, qualifications and training.