Data Network Engineer - SRE, Telemetry, Observability, Monitoring & Performance
Seeking a Network Engineer with experience of Telemetry, Observability, Monitoring & Peformance, ideally within a high availability Network Infrastructure Site Reliability Engineering environment. The network strategy is highly focused towards Next-Gen, Software Defined Networking and in this role you you will work at the intersection of software engineering, Networks SRE & platform operations & engineering, with the ulitmate aim of developing actionable insights from telemetry data and enhancing the value of observability tooling.
Previous experience might include:
* Collaborate cross-functionally to ensure observability is Embedded into the SDLC & CI/CD pipelines.
* Designing & implementing telemetry pipelines for metrics, logs, traces, and events.
* Developing observability standards, NMS tooling, dashboards, alerting frameworks, and SLOs.
* Integrating & optimising observability tools such as OpenTelemetry, Prometheus, Grafana, Splunk & Elastic.
This role will require:
* Having previously worked within Network/Platform Observability, Networks SRE, or Platform Engineering roles within complex, distributed environments.
* Strong expertise with telemetry tools such as OpenTelemetry, Prometheus, Grafana, Splunk, Elastic, Loki, Jaeger, or similar.
* Proficiency in at least one programming language (eg, Python, Go, Java) and infrastructure-as-code tools (eg, Terraform, Helm).
* Deep understanding of cloud-native architectures (Kubernetes, microservices, service meshes).
Highly desired:
* Industry experience such as the following Media/Streaming, High Frequency Trading eg Investment Banking, Online Gaming, Hyperscalers, High Availability, Low Latency Network Infrastructure