Salary: £100,000 - 100,000 per year Requirements: 15 years building and scaling distributed systems Strong hands-on experience with Golang (plus Python or Shell) Deep expertise in observability at scale Strong Kubernetes and cloud infrastructure experience Proven ability to design systems for performance, scale, and cost efficiency Experience with service mesh technologies (e.g. Istio/Envoy) Ability to operate as a technical authority and trusted advisor across teams Nice to Have: Open-source or CNCF contributions Nice to Have: Experience using AI tools to improve engineering efficiency Responsibilities: Act as the SME for observability and cloud infrastructure across the organization Lead architecture across metrics, logs, and tracing systems Design and optimize high-throughput data pipelines and storage layers Implement strategies such as sampling, aggregation, and down-sampling Extend and enhance open-source observability tools at scale Partner with engineering teams to standardize tooling and improve adoption Drive reliability, scalability, and cost optimization across the platform Define and promote best practices aligned with OpenTelemetry and modern observability standards Mentor engineers and elevate engineering quality across teams Technologies: AI Cloud Golang Istio Kubernetes OpenTelemetry Python AWS OpenSearch CI/CD ClickHouse DevOps ELK GCP Grafana Prometheus Terraform More: We are scaling a large, distributed cloud platform and looking for a Principal Engineer to act as the Subject Matter Expert across observability and cloud infrastructure. You will work at serious scale managing thousands of Kubernetes nodes and handling tens of terabytes of logs daily, supporting millions of real-time metrics across a highly distributed environment. This is a senior, hands-on role where you will own the technical direction and standards of the observability ecosystem while having strong ownership and influence at the Principal level. last updated 17 week of 2026