I am partnering with a leading tech organisation to recruit a Senior Site Reliability Engineer on a day-rate contract for 12 months This is a hands‑on, high‑impact role working closely with engineering teams to drive reliability, scalability, and operational excellence across critical production systems.
What You’ll Do:
* Lead reliability initiatives and own operational performance across core services
* Define and refine SLIs, SLOs, and error budgets aligned with business outcomes
* Drive sophisticated incident management, post‑incident analysis, and remediation planning
* Influence system architecture for high availability, resilience, and multi‑region disaster recovery
* Build automation and CI/CD pipelines, applying safe deployment patterns like canary, blue/green, or progressive delivery
* Develop observability solutions (metrics, logs, traces) and troubleshoot performance bottlenecks
* Mentor engineers and embed SRE best practices across the organisation
* Operate cloud‑native and containerised workloads at scale, leveraging IaC tools to manage resilient platforms
What You Bring:
* 7+ years in site reliability, production, or systems engineering roles
* Hands‑on experience with cloud platforms (AWS, Azure, GCP) and Kubernetes
* Strong programming skills (Python, Go, Java) for automation and tooling
* Proven experience leading high‑severity incidents and delivering systemic improvements
* Deep understanding of distributed systems, fault isolation, and scalability
Bonus Experience:
* Multi‑cloud or multi‑region resilience architecture
* Observability tools (Prometheus, Grafana, Datadog)
* IaC experience (Terraform, CloudFormation)
#J-18808-Ljbffr