We are looking for a highly skilled Engineer with expertise in Python programming, automation, and modern observability practices to help build and operate scalable distributed systems for an award-winning London Hedge Fund. This role sits at the intersection of platform engineering, AI tooling, and system reliability. You will design automation frameworks, develop AI-assisted engineering tools, and implement observability solutions that provide deep insights into complex distributed architectures.
Responsibilities
* Design, develop, and maintain robust automation solutions using Python.
* Build and maintain observability pipelines including metrics, logs, and traces across distributed systems.
* Develop internal AI-powered tools that enhance engineering productivity and operational intelligence.
* Implement monitoring, alerting, and diagnostics to improve system reliability, performance, and scalability.
* Integrate observability platforms with automation workflows and incident response systems.
* Collaborate with platform, infrastructure, data and development teams to improve system visibility and operational maturity.
* Design tooling that enables proactive detection, analysis, and remediation of system issues across distributed environments.
* Contribute to architecture decisions around telemetry, AI-assisted debugging, and automation frameworks.
* Support business users and stakeholders (direct) with system analysis, problem management, and technical resolution.
Skills & Experience
* Strong professional experience with Python development in production environments.
* Proven experience building automation frameworks, scripts, and developer tooling.
* Strong experience working with distributed systems and large-scale service architectures.
* Hands-on experience working with Kubernetes in production environments.
* Deep understanding of observability practices, including metrics, logs, tracing, and telemetry pipelines.
* Experience integrating AI or machine learning tooling into engineering workflows.
* Strong understanding of APIs, microservices, and containerised environments.
* Experience with CI/CD pipelines and infrastructure automation.
* Ability to design scalable, maintainable engineering tools.
* Experience in supporting business users directly, project or problem coordination with dev and infra teams, project ownership experience.
Interesting Technologies
* Observability: OpenTelemetry, Prometheus, Grafana, Elastic Stack (ELK), Jaeger
* Automation & CI/CD: GitHub Actions, Jenkins, GitLab CI, Argo Workflows
* Distributed Systems & Messaging: Kafka, Redis, gRPC
Offer
* World-class technology environment (award-winning) with best-in-class engineering teams.
* Fast-paced and low-bureaucracy culture - get stuff done mindset.
* Up to £150,000 base salary. 50%-100% annual cash bonus. Pension, Healthcare, Gym, Food, 30 days holiday etc.
* 4 days onsite, 1 day wfh.
* The chance to shape the future of intelligent automation and operational insight in distributed platforms.
#J-18808-Ljbffr