Site Reliability Engineer (SRE) – Kubernetes & Cloud | FTE | Basildon (Onsite – 5 Days)
Position Summary
We are seeking a highly skilled Site Reliability Engineer (SRE) with strong expertise in Kubernetes, Cloud (AWS / Azure / GCP), and modern automation practices. The SRE will design, deploy, automate, and maintain highly available, scalable, and secure cloud‑native and containerised applications.
You will collaborate closely with Development, Operations, and Security teams to ensure system reliability, performance, and efficiency across production environments.
Key Responsibilities
* Design, deploy, and manage Kubernetes clusters (EKS, AKS, GKE, or On‑Prem) for microservices architectures.
* Automate infrastructure provisioning and application deployment using Terraform, Helm, CloudFormation, or equivalent IaC tools.
* Implement and manage CI/CD pipelines (Jenkins, GitLab CI, ArgoCD) for fast, reliable software delivery.
* Ensure system reliability, availability, and security through proactive monitoring, incident response, and root cause analysis.
* Utilise observability tools – Prometheus, Grafana, ELK / EFK, Datadog – to monitor, troubleshoot, and optimise performance.
* Develop runbooks, dashboards, and operational documentation.
* Participate in on‑call rotations and ensure minimal downtime with fast recovery.
* Drive DevOps and SRE best practices – including capacity planning, cost optimisation, and automation improvements.
* Continuously enhance reliability, performance, and automation within production systems.
Required Skills & Experience
* 3+ years of experience as SRE / DevOps Engineer in production‑grade environments.
* Hands‑on experience with AWS / Azure / GCP.
* Strong Linux system administration and networking knowledge.
* Proficiency in scripting/programming (Python, Bash, Go).
* Experience with IaC tools (Terraform, Helm, CloudFormation, ARM).
* Familiarity with monitoring/logging tools (Prometheus, Grafana, ELK/EFK, Datadog).
* Understanding of cloud & container security best practices.
* Excellent communication, problem‑solving, and collaboration skills.
Preferred Qualifications
* Certified Kubernetes Administrator (CKA) or equivalent.
* Experience with Service Mesh (Istio, Linkerd), Ingress Controllers, and API Gateways.
* Exposure to multi‑cloud / hybrid environments.
* Familiarity with GitOps tools (ArgoCD, Flux).
* Knowledge of disaster recovery, backup, and business continuity planning.
Education
Bachelor’s Degree in Computer Science, Engineering, or a related field (or equivalent experience).
Mandatory Skills
* Monitoring & Observability
* Service Level & Error Budget Management
* Automation & Scripting
* Critical Incident Response
Why Join Us?
Work in a fast‑paced, cloud‑native environment with a focus on automation, reliability, and continuous improvement. You’ll collaborate with passionate engineers solving real‑world scalability and reliability challenges.
#J-18808-Ljbffr