Contract: Site Reliability Engineer (Linux Administration & Server Hardware)
Below, you will find a complete breakdown of everything required of potential candidates, as well as how to apply Good luck.
Location: Glasgow (hybrid - 3 days onsite)
Duration: 6 months
Day Rate: Negotiable (Inside IR35 via umbrella solution)
Reference: 20460
We are looking for an experienced Linux Site Reliability Engineer (SRE) to join a high-performing infrastructure support team focused on maintaining and improving critical platform reliability within a large-scale enterprise environment.
This position will focus on resolving hardware and platform-related incidents escalated from the L3 support team. The successful candidate will have strong Linux systems expertise and, hands-on physical server troubleshooting experience, and a proactive approach to operational improvement, automation, and incident reduction.
Essential Skills / Requirements
* Strong Linux administration and troubleshooting skills (process, networking basics, logs,
* package/service management).
* Solid understanding of server hardware and peripherals (disks, RAID/HBA, NICs,
* firmware) and how failures present at OS level.
* Experience with out-of-band management / lights-out technologies (e.g., iDRAC, iLO,
* IPMI/Redfish) for remote troubleshooting and recovery.
* Proven ability to own incidents end-to-end: triage, identify mitigations/workarounds,
* coordinate with L3/engineering, communicate status, and drive to resolution.
* Understanding of SRE operational practices and metrics (e.g., SLO/SLI concepts, error
* budgets, MTTD/MTTR) and a continuous-improvement mindset.
* Strong communication skills (written and verbal): clear incident updates,
* customer/stakeholder management, and effective escalation and handoffs.
* Strong documentation skills: writing clear runbooks/procedures, contributing to knowledge
* bases, and participating in post-incident reviews/root cause analysis.
Nice to Have / Desired Skills
* Scripting and automation skills (e.g., Bash, Python) to build small tools, checks, and
* workflow automation that reduce toil.
* Familiarity with virtualization and containerization concepts/operations (e.g.,
* VMware/KVM, Docker, Kubernetes) and using automation to support these environments. xsngvjr
* Experience with monitoring/observability and alerting workflows (dashboards, log
analysis, alert tuning) and translating signals into actionable response steps
Networking People (UK) is acting as an Employment Business in relation to this vacancy.