Site reliability engineer

Penrith

Halian

Posted: 5 June

Offer description

Are you among the top 1% of Site Reliability Engineers in the UK?

Check below to see if you have what is needed for this opportunity, and if so, make an application asap.

Our client an IT Service Management company is building a world-class SRE team to support a mission-critical Java-based platform used by millions. If you’re a hands-on engineer with a background in Linux systems, deep AWS expertise, and a passion for incident response, reliability, and scale, we want to hear from you.

What You’ll Be Doing:

Own and evolve our incident management and on-call processes

Ensure uptime, scalability, and security across a massive infrastructure footprint

Work with EKS, EC2, Load Balancers, VPC, CDK, Terraform, CloudFormation

Write and maintain YAML, Python scripts, and internal tooling

Define and track SLAs, SLOs, and SLIs to drive reliability

Collaborate with platform engineers and developers to support a Java-based product

Operate in a manual, tool-light environment while helping us scale and automate

What We’re Looking For:

7–12 years of experience, with 5+ years in SRE roles

Strong Linux/System Admin foundation

Proven experience in live incident troubleshooting and root cause analysis

Deep AWS knowledge – you can speak to how you’ve used services like EKS, EC2, Load Balancers in production

Experience with monitoring, alerting, capacity planning, and security best practices

Comfortable working in large-scale environments with thousands of endpoints

Clear communicator who can document and share knowledge across teams

Able to work independently and thrive in a globally distributed team

Apply

Create E-mail Alert

Save

Similar job

Site reliability engineer

Penrith

Halian

Site reliability engineer

Similar job

Site reliability engineer

Penrith

Stealth IT Consulting

Site reliability engineer

Similar job

Site reliability engineer

Penrith

Halian

Site reliability engineer