AWS Site Reliability Engineer (Data Platform) – Contract
Location: Glasgow
Contract Length: February 2026 – January 2027
Role Overview
We are recruiting an AWS Site Reliability Engineer (SRE) to support a cloud-native data platform for a major international financial services organisation. The platform is built on AWS, with core components including Snowflake and Databricks, and underpins critical analytics and data services used across the business.
This role focuses on reliability engineering, automation, observability, and resilience. You will work closely with data engineering and platform teams to ensure the platform is scalable, highly available, and operationally robust in a regulated, high-availability environment.
Key Responsibilities
* Design, build, and maintain automation for infrastructure provisioning, platform operations, and incident response using Infrastructure as Code (IaC) and CI/CD
* Lead resiliency and disaster recovery (DR) planning, including DR testing, failure scenarios, and recovery validation across AWS and data platform services
* Define and manage SLIs, SLOs, and SLAs for critical data pipelines and platform services, using error budgets to drive reliability improvements
* Build and operate comprehensive observability solutions (metrics, logs, traces, alerting) across AWS, Snowflake, and Databricks workloads
* Partner with data engineering and platform teams to embed reliability-by-design into architecture and delivery
* Perform root cause analysis (RCA) on incidents and drive continuous improvement to reduce operational toil
* Own and drive resolution of incidents and service requests raised by platform consumers, identifying recurring issues and automating fixes to improve reliability and user experience
Required Skills & Experience
* Strong practical experience applying Site Reliability Engineering (SRE) principles, including SLO/SLI/SLA design and error budgets
* Proven production experience with AWS (e.g. EC2, S3, IAM, VPC, CloudWatch)
* Hands‑on experience with automation and Infrastructure as Code (Terraform, CloudFormation, or CDK)
* Experience building and operating observability and monitoring solutions
* Scripting experience in Python and/or Bash
* Exposure to data platforms such as Snowflake and/or Databricks
#J-18808-Ljbffr