[Up to c. £225k Comp Package | Hybrid Working - 3 Days in Office]
Role Overview
We’re representing a global trading and digital assets firm at the forefront of high-performance technology and infrastructure innovation. The business is seeking a Site Reliability & Infrastructure Engineer to help design, automate, and scale the systems that underpin its global trading platforms. This role sits within a high-performing 11-person infrastructure team that combines Site Reliability and Core Infrastructure responsibilities - owning everything from AWS cloud systems to on-prem deployments. The team is expanding to meet new strategic demands, including increased automation, enhanced observability, and the rollout of new colocation environments to support lower-latency trading. It’s a technically hands-on position that blends architecture, build, and operational ownership, suited to an engineer with curiosity, precision, and a drive to constantly improve how infrastructure is built and run...
Key Responsibilities
* Design, build, and maintain highly available infrastructure across both cloud (AWS) and on-prem environments
* Implement automation across the stack using Infrastructure-as-Code principles (Terraform, Ansible, or similar)
* Administer and optimise Kubernetes clusters across multiple regions, improving resilience, performance, and visibility
* Develop tools and scripts in Python or Go to automate monitoring, configuration, and incident response workflows
* Contribute to on-prem colocation expansion projects, introducing low-latency engineering practices into the infrastructure
* Optimise Linux systems for performance and reliability, including kernel tuning and networking configuration
* Partner with development and platform teams to embed SRE best practices, reducing manual toil through automation and observability
* Drive improvements in monitoring, alerting, and log collection pipelines to enhance system insight and uptime
* Participate in architecture and design reviews, guiding platform evolution with reliability and scale in mind
* Collaborate across disciplines to ensure seamless integration between infrastructure, applications, and security teams
What You’ll Bring...
* 4+ years’ experience in Site Reliability, Infrastructure, or Platform Engineering within production environments
* Solid experience working with AWS and hybrid infrastructure
* Proven ability to manage Kubernetes clusters at scale (on-prem or EKS), including configuration and performance tuning
* Proficiency in Python, Go, or another programming language, with a willingness to code daily
* Strong Linux engineering skills - comfortable with system internals, troubleshooting, and performance optimisation
* Knowledge of network fundamentals (TCP/IP, routing, DNS, firewalls) and how they apply in high-performance environments
* Familiarity with automation tooling such as Terraform or Ansible
* Experience building or maintaining CI/CD pipelines and GitOps workflows
* A proactive, analytical mindset - eager to explore, ask the right questions, and challenge the status quo
* (Preferred) Exposure to low-latency systems, colocation deployments, or real-time trading platforms
..