Site Reliability Engineer, Contract (12m), Leeds OR Bristol (hybrid, 2 days a week)
Multiple Site Reliability Engineers need for a national core banking project on a team supporting R&D projects across a large GCP environment.
You will be the bridge between development and infrastructure, focusing on the availability, latency, performance, and capacity of GCP-hosted services.
You won’t just be managing servers; you’ll be building the automated systems that manage them for you. Our goal is to maintain a high "feature velocity" while ensuring our error budget remains intact.
Key Responsibilities
* Infrastructure as Code (IaC): Architect and maintain scalable infrastructure using Terraform
* Kubernetes Orchestration: Manage and optimize our GKE (Google Kubernetes Engine) clusters, including autoscaling, networking, and security policies.
* Observability: Implement and refine monitoring, logging, and tracing using Google Cloud Operations Suite (formerly Stackdriver), Prometheus, or Grafana.
* Post-Mortems to ensure we learn from every production hiccup.
* CI/CD Optimization: Streamline our deployment pipelines using Cloud Build, GitLab CI, or GitHub Actions to ensure "boring" and predictable releases.
Technical Requirements
* GCP Expertise: Deep experience with core services (Compute Engine, Cloud Run, Pub/Sub, Cloud Spanner/SQL, and IAM).
* Networking: Solid understanding of VPCs, Load Balancing, DNS, and TLS/SSL.
* Coding: a bonus if you have ability in Python, Go, or Java.
* SRE Mindset: Familiarity with the Google SRE Book concepts (SLIs, SLOs, and Error Budgets).
Preferred Qualifications
* Background in managing large-scale data pipelines or distributed systems.
#J-18808-Ljbffr