Principal Software Engineer Location: Cambridge Our client is scaling a large, distributed cloud platform and is looking for a Principal Engineer to act as the Subject Matter Expert (SME) across observability and cloud infrastructure. Youll be working at serious scale managing thousands of Kubernetes nodes, handling tens of terabytes of logs daily, and supporting millions of real-time metrics across a highly distributed environment. The Role This is a senior, hands-on role where you will own the technical direction and standards of the observability ecosystem. As the SME, youll define best practice, guide architectural decisions, and act as the go-to expert across engineering teams, ensuring scalable, cost-efficient, and high-performance systems. Key Responsibilities Act as the SME for observability and cloud infrastructure across the organisation Lead architecture across metrics, logs, and tracing systems Design and optimise high-throughput data pipelines and storage layers Implement strategies such as sampling, aggregation, and down-sampling Extend and enhance open-source observability tools at scale Partner with engineering teams to standardise tooling and improve adoption Drive reliability, scalability, and cost optimisation across the platform Define and promote best practices aligned with OpenTelemetry and modern observability standards Mentor engineers and elevate engineering quality across teams Tech Environment Kubernetes at scale (thousands of nodes) High-volume telemetry (hundreds of thousands of events per second) Observability stack: Prometheus, Grafana, Loki, Tempo, ELK/OpenSearch, ClickHouse Multi-cloud (AWS, GCP) Infrastructure as code (Terraform), CI/CD pipelines What Were Looking For 15 years building and scaling distributed systems Strong hands-on experience with Golang (plus Python or Shell) Deep expertise in observability at scale Strong Kubernetes and cloud infrastructure experience Proven ability to design systems for performance, scale, and cost efficiency Experience with service mesh technologies (e.g. Istio/Envoy) Ability to operate as a technical authority and trusted advisor across teams Nice to Have Open-source or CNCF contributions Experience using AI tools to improve engineering efficiency Why Join Be the go-to expert shaping a large-scale observability platform Work on complex, high-impact infrastructure challenges Strong ownership and influence at Principal level