Salary: £30,000 - 70,000 per year Requirements: We have strong Linux administration and troubleshooting skills, including process management, basic networking, log analysis, and package and service management. We understand server hardware and peripherals, including disks, RAID/HBA, NICs, and firmware, and how failures appear at the operating system level. We have experience with out-of-band management and lights-out technologies such as iDRAC, iLO, IPMI, or Redfish for remote troubleshooting and recovery. We can own incidents end to end, including triage, mitigation, coordination with L3 and engineering teams, status communication, and resolution. We understand SRE operational practices and metrics such as SLOs, SLIs, error budgets, MTTD, and MTTR, and we work with a continuous improvement mindset. We have strong written and verbal communication skills for incident updates, stakeholder management, and escalation handoffs. We have strong documentation skills, including writing runbooks and procedures, contributing to knowledge bases, and supporting post-incident reviews and root cause analysis. We have scripting and automation skills in tools such as Bash or Python to reduce toil. We are familiar with virtualization and containerization concepts and operations such as VMware, KVM, Docker, or Kubernetes. We have experience with monitoring, observability, and alerting workflows, including dashboard use, log analysis, alert tuning, and translating signals into actions. Responsibilities: We resolve hardware and platform-related incidents escalated from the L3 support team. We maintain and improve platform reliability within a large-scale enterprise environment. We troubleshoot Linux systems and physical server issues to identify root causes and effective mitigations. We coordinate with L3 and engineering teams to drive incidents through to resolution. We provide clear incident updates and communicate effectively with customers and stakeholders. We document procedures, create and improve runbooks, and contribute to knowledge management. We take part in post-incident reviews and root cause analysis to help reduce future incidents. We identify opportunities for operational improvement, automation, and toil reduction. We use monitoring and observability signals to support response and improve alerting workflows. We work hybrid from Glasgow, with three days onsite each week, on a six-month contract basis. Technologies: Bash Docker Firmware Hardware Support KVM Kubernetes Linux Python VMware More: We are a high-performing infrastructure support team focused on maintaining and improving critical platform reliability in a large-scale enterprise environment. This is a six-month contract role based in Glasgow with a hybrid working pattern of three days onsite each week. The day rate is negotiable and the engagement is inside IR35 via an umbrella solution. We are looking for an experienced Linux Site Reliability Engineer to help resolve hardware and platform-related incidents, strengthen operational stability, and support continuous improvement across our environment. last updated 20 week of 2026