At CV-Library, we have a simple vision: to help the world to work and we are looking for exceptional and talented people to help us realise this vision in both UK and overseas markets. We are in a period of focused internal investment, following a year of key strategic acquisitions and significant investment across all parts of the business from Tech and Data to People and HR, there's never been a more exciting time to join us or a better place to grow your career! The Role Hours: Monday-Friday, 9:00-17:30 Location: Fleet Working Pattern: Hybrid - 1-2 days a week on site As a Site Reliability Engineer, you will help design, operate and improve the reliability of our high-traffic web platforms and cloud-native services running across AWS and hybrid environments. As part of a small but highly impactful team, you will focus on system reliability, observability and operational automation, while contributing to the ongoing evolution of our infrastructure platform. You will work closely with engineering teams to improve system resilience, deployment velocity and operational maturity. What your day-to-day will look like: Manage and optimise AWS infrastructure including EC2, EKS, RDS, Aurora, S3, VPC, IAM, Route53 and CloudWatch Improve system reliability, availability and resilience across production services Define and improve operational practices including monitoring, alerting and incident management Drive improvements in observability, metrics, logging and tracing Participate in incident response and post-incident reviews, helping prevent recurrence Contribute to capacity planning and performance optimisation Operate and improve containerised workloads using Docker and Kubernetes Maintain and evolve infrastructure supporting high-traffic production services Automate operational workflows and reduce manual toil Implement and improve CI/CD pipelines Build and maintain Infrastructure as Code Work closely with developers to improve deployment processes and system operabilityWhat we're looking for (essential): Strong, hands-on experience with Linux system administration and troubleshooting Solid experience operating production services in AWS environments Experience managing containerised platforms using Docker and Kubernetes Strong understanding of networking fundamentals (TCP/IP, DNS, routing, TLS, load balancing) Experience with monitoring, logging and observability platforms Experience participating in incident management and operational support processes Strong scripting and automation skills (e.g. Bash, Python, PowerShell) Experience with AWS services such as EC2, EKS, RDS, Aurora, VPC and IAM Experience managing databases in production environments (Aurora/MySQL) Familiarity with hybrid cloud or virtualised environments Good understanding of infrastructure networking and connectivity We are actively committed to promoting a fully diverse and inclusive workforce and we welcome applications for this role from all candidates who meet the key requirements. Please do not hesitate to get in touch should you require any reasonable adjustments to assist with your application