Site Reliability Engineer
Fully Remote (UK Only)
£60,000 – £65,000
We’re working with a fast-growing online retailer that’s hiring its first-ever SRE.
This is your chance to take full ownership of reliability across a high-traffic, customer-facing platform – setting the standards, choosing the tools, and building everything from scratch.
There are no legacy systems to untangle and no red tape.
You’ll work across the full stack – from infrastructure and automation to observability and incident response – with a direct line into the product, engineering, and security teams.
The business is investing heavily in performance, uptime, and scalability. SRE is a key part of that strategy.
Tech stack includes:
AWS, Azure, Docker, Kubernetes, Prometheus, Grafana, Linux, and Cloudflare – but there’s full freedom to bring in new ideas and better tools if they help.
What you'll be doing:
* Designing and building reliable infrastructure
* Automating deployment and scaling
* Improving observability and response times
* Leading incident management and root cause analysis
* Helping dev teams ship faster, with fewer issues
What they’re looking for:
* Solid experience with cloud platforms (AWS or Azure)
* Strong with containers and orchestration (Docker, Kubernetes)
* Good knowledge of monitoring and observability (Prometheus, Grafana, etc.)
* Strong Linux background
* Bonus: experience with Cloudflare or web security
Flat structure. No micromanagement. Just a smart team, moving fast, and building things that scale.