Site Reliability Engineer – (SRE, Terraform, AKS, Azure, Kubernetes, PowerShell, Python, Bash, Datadog, Monitoring Tools) – Permanent – Remote Charles Simon Associates are proud to partner with a global technology business, headquartered in Nottinghamshire, in their search for a Site Reliability Engineer. This is an exciting opportunity to join a company that is passionate about modern reliability practices. They’re looking for someone who shares their drive for excellence, if you’re enthusiastic about automation, scalability, and building resilient systems, this role is for you. Location: Remote, with occasional travel to Nottinghamshire HQ Salary: Up to £75,000 per annum Skills/Requirements for the Site Reliability Engineer: * Proven experience in Site Reliability Engineering or similar roles * Strong Terraform skills and hands-on experience with live environment deployments * Solid Kubernetes and AKS expertise * Familiarity with monitoring tools (Datadog preferred; Azure Application Insights, Log Analytics, Grafana also considered) * Scripting/automation skills (PowerShell, Python, Bash) * Experience supporting web-based applications Desirable: * Exposure to microservices architectures * Knowledge of Agile methodologies (Kanban, Scrum) * Experience with tools such as Puppet or Chef What You’ll Do: As a Site Reliability Engineer, you will: * Design and implement SLOs, SLIs, and SLAs to align reliability with business needs * Build and maintain incident response frameworks (runbooks, postmortems, blameless RCA) * Enhance observability with tools such as Prometheus, Grafana, Datadog, and OpenTelemetry * Manage infrastructure as code (Terraform, Pulumi, or CloudFormation) for consistent deployments * Optimize cloud performance and costs through autoscaling, rightsizing, and lifecycle management * Introduce chaos engineering practices to validate resilience and recovery strategies * Champion cloud security best practices (secrets management, IAM policies, vulnerability scanning) * Collaborate with DevOps and platform teams to create paved-road deployment patterns and internal developer portals * Lead capacity planning and load testing to ensure systems scale effectively * Contribute to architectural decisions around reliability, latency, and fault tolerance * Share knowledge, mentor colleagues, and promote SRE culture across teams Please send an up-to-date CV to be considered for this position. Site Reliability Engineer – (SRE, Terraform, AKS, Azure, Kubernetes, PowerShell, Python, Bash, Datadog, Monitoring Tools) – Permanent – Remote