Sr. Network Site Reliability Engineer (SREs)
London, United Kingdom | Posted on 09/12/2025
We provide end-to-end IT solutions and services including Applications services, Data & Analytics services, AI/ML Technologies and Professional services in the UK and EU market.
Job Description
Overview
We are seeking a highly experienced Senior Network SRE with deep expertise across multi-vendor network infrastructure, automation, and reliability engineering. The ideal candidate will possess strong technical leadership, hands‑on engineering capabilities, and a passion for building resilient, scalable, and observable network environments.
Key Responsibilities
* Design, implement, and maintain highly available network solutions across routing, switching, firewalling, and wireless technologies.
* Apply SRE principles to improve network reliability, scalability, and performance.
* Develop and maintain automation workflows using Ansible, Salt, and related frameworks to reduce operational toil.
* Build and operate monitoring, alerting, and observability dashboards using tools such as Grafana and Splunk.
* Proactively identify network bottlenecks, performance issues, and reliability risks, implementing long‑term fixes rather than reactive solutions.
* Support incident response, root cause analysis, and post‑incident reviews with a focus on continuous improvement.
* Collaborate with cross‑functional engineering, security, and operations teams to ensure network solutions meet business and technical requirements.
* Contribute to documentation, runbooks, design artifacts, and operational standards.
* Participate in capacity planning, network modernization initiatives, and automation‑first strategies.
Required Skills & Experience
* 10+ years of hands‑on experience in enterprise or service provider network engineering.
* Expertise in multi‑vendor routing, switching, firewalling, and wireless technologies.
* Deep understanding of network protocols (BGP, OSPF, EIGRP, STP, VXLAN, VPNs, QoS, MPLS, etc.).
* Strong experience with infrastructure automation using Ansible and Salt.
* Proficiency with observability tooling such as Grafana, Splunk, or equivalent.
* Solid understanding of SRE practices including SLIs, SLOs, error budgets, and proactive reliability.
* Strong troubleshooting, analytical, and performance optimization skills.
* Excellent communication and collaboration skills, with the ability to influence and guide technical stakeholders.
Nice to Have
* Experience with network programmability (Python, API‑driven networking, NetConf/RESTConf).
* Exposure to cloud networking (AWS, Azure, GCP).
* Knowledge of zero‑trust, SD‑WAN, and network security best practices.
* Experience creating self‑healing or fully automated network workflows.
#J-18808-Ljbffr