Site Reliability Engineer (SRE) - Data & Insights
Join to apply for the Site Reliability Engineer (SRE) - Data & Insights role at Lloyds Banking Group
Base pay range: £70,929 - £78,810 per annum. Full‑time. Hybrid working pattern.
About the role
We're looking for a Data Site Reliability Engineer (SRE) to join our Personalised Experiences & Communications (PEC) Customer Intelligence Lab Team within our PEC Platform. You will ensure the reliability, scalability, security and operability of PEC's real‑time and micro‑batch data products on GCP. The SRE will partner with Data Engineering, Architecture and Platform teams to design SLOs/SLIs, automate ops, reduce toil, harden controls, and lead incident/problem management across ingress and egress decisioning systems, ingestion into ODPs (Origin Data Products) and FDPs (Foundation Data Products).
Day to day
* Handle real‑time and batch‑based data pipelines
* Design and maintain SLOs, SLIs, and error budgets across our data systems, driving continuous improvements in reliability.
* Partner with Data Engineers, full‑stack Software Engineers, and Platform teams to automate deployments and manage infrastructure using IaC (Terraform, CloudFormation, etc.).
* Lead root cause analysis and post‑incident reviews, embedding reliability learnings into our platform and processes.
* Own the data products pipeline that the team builds in the lab from go‑live through to delivery
* Ensure the reliability of the lab while proposing and recommending paths to improve and keep our data robust and under control
* Develop and collaborate with other teams within the PEC Platform to build GCP‑based products
What you'll need
* SRE core: dashboarding, SLO/SLI design, error budgets, production change/incident/problem management, strong runbook craft.
* Strong background in Site Reliability Engineering, DevOps, or Data Engineering (4+ years).
* Experience with CI/CD pipelines and tools such as Jenkins and Harness.
* Proven experience with cloud platforms (GCP, AWS, or Azure) and containerisation (Kubernetes, Docker).
* Deep understanding of data infrastructure – streaming, batch, warehouse, and orchestration tools (e.g. Kafka, Airflow, Spark, dbt, Snowflake).
* Hands‑on experience with Infrastructure as Code (Terraform, CloudFormation, etc.).
* Familiarity with monitoring and observability tools.
* Proficiency in automation and scripting languages such as Python, Go, or Bash.
* Practical knowledge on how data pipelines work and how to enable Machine Learning.
* GCP data stack: Dataflow/Apache Beam (streaming & micro batch), Pub/Sub, BigQuery (including streaming inserts/Storage Write API), Cloud Composer/Airflow; Cloud Logging/Monitoring/Trace.
* Streaming: Kafka fundamentals (partitions, consumer groups, compaction, schema governance), connectors and DLQs.
* Security & compliance on GCP.
Additional experience that would be useful
* Experience with Azure or AWS Cloud public cloud platforms
* Experience with real‑time streaming and event‑driven architectures.
* Exposure to machine learning infrastructure, feature stores, or model serving.
* Familiarity with data lineage, cataloguing, or metadata management tools.
* Experience contributing to open‑source or internal reliability initiatives.
About working for us
We are committed to building an inclusive organisation with diverse perspectives. We welcome applications from under‑represented groups. We are disability confident and provide reasonable adjustments to recruitment.
Benefits
* A generous pension contribution of up to 15%
* An annual performance‑related bonus
* Share schemes including free shares
* Benefits you can adapt to your lifestyle, such as discounted shopping
* 30 days' holiday, with bank holidays on top
* A range of wellbeing initiatives and generous parental leave policies
Ready for a career where you can have a positive impact as you learn, grow and thrive? Apply today and find out more.
Seniority level: Mid‑Senior level
Employment type: Full‑time
#J-18808-Ljbffr