Observability engineering lead

London

McGregor Boyall Associates Limited

Engineering

Posted: 23 February

Offer description

Requirements

Must have:

- Deep expertise in designing, implementing, and configuring modern observability stacks - specifically Prometheus, Grafana, and associated tooling. - Strong instrumentation strategy (exporters, service discovery, custom metrics). - Advanced PromQL skills for complex querying and performance analysis. - Experience building recording/alerting rules and optimizing metric ingestion. - Knowledge of HA architectures, federation, sharding, and long-term storage (Thanos, Cortex, Mimir). - Grafana Dashboard and panel design focused on performance and operator clarity. - Best-practice alert configuration and routing. - Experience with synthetic monitoring (Grafana Synthetic Monitoring, Blackbox exporter). - Log ingestion/analysis (Loki). - Familiarity with Real User Monitoring tooling (e.g., Grafana Faro). - Strong API and automation skills for dashboard provisioning, alert management, and data ingestion. - Experience integrating the Grafana/Prometheus ecosystem with logging, tracing, and event platforms (Loki, Tempo, OpenTelemetry).

Responsibilities:

- Drive the uplift, resilience, and effectiveness of our monitoring ecosystem. - Partner with engineering teams to deliver world-class insights through metrics, dashboards, alerts, and automation. - Influence standards, modernise tooling, and enhance visibility across complex distributed systems. - Collaborate with Application Stewards and SREs to validate critical assets for monitoring verification and uplift. - Analyse Prometheus scrape coverage, exporter deployment, and Grafana dashboard availability for critical services. - Identify and implement improvements across monitoring configurations, alert quality, data models, dashboards, KPIs, SLIs, and SLOs. - Review roles and responsibilities across observability functions and recommend enhancements aligned to Operational Resilience standards. - Contribute to delivering automated, end-to-end business flow visibility, surfaced in Grafana through service maps, dependency visualisation, or topology integrations. - Ensure alerting configurations are reliable, actionable, and noise-optimised, following Alertmanager best practices.

Company:

We are seeking a highly skilled Observability Engineering Lead to shape how we detect, diagnose, and prevent issues across our critical applications. This hands-on technical leadership position allows you to play a pivotal role in our team, working onsite for 2 days a week. We are committed to fostering an inclusive environment as we strive for excellence in our monitoring practices.

Apply

Create E-mail Alert

Save

Similar job

Maintenance manager

Dartford

AccorHotel

Maintenance manager

Similar job

Distribution engineer

London

Warner Bros. Discovery

Engineer

Similar job

Escalator engineer (days)

London

Stannah Management Services

Escalator engineer