At CV-Library, we believe in creating a workplace where innovation thrives and every contribution matters. Our mission is to facilitate effective job matching and career development, not just for our users but also for our own team members. We are looking for a Site Reliability Engineer Lead to ensure our systems are reliable, scalable, and efficient.
As the Site Reliability Engineer Lead, you will take charge of maintaining the health and performance of our platforms while also leading a talented team of engineers. You will champion and coach best practices in reliability and operational excellence to deliver an exceptional experience for our users.
Key Responsibilities
* Minimising downtime to products & services and ensuring the platform is stable
* To drive and own the Monitoring strategy, defining clear goals, objectives, and deliverables.
* Optimise and reduce operational overheads through observability and service automation.
* Lead the definition and track Service Level Objectives (SLO) to measure service availability in combination with service, product and engineering communities.
* Collaborate with product and engineering functions to ensure delivery and reliability outcomes are mutually agreed and achieved.
* Ensure a framework and culture that ensures continuous improvement of platform health, compliance and resiliency.
* Oversee the implementation of best practices for system monitoring, incident response, and problem resolution to ensure high availability and performance.
* Work with senior stakeholders to mature the concept of Site Reliability within the CVL organisation.
* Lead and mentor the SRE function, fostering a culture of collaboration, innovation, and excellence.
* Creating a bridge between Development and support teams by applying an ‘as-a-service' mindset to system administration and management.
* Gaining exposure to systems in both staging and production, as well as all technical teams.
* Ad hoc duties as and when required by line management.
Requirements
Essential Requirements:
* Strong problem-solving skills and the ability to think analytically.
* Ability to prioritize and manage multiple tasks in a fast-paced environment.
* Experience in software development, infrastructure, or operations roles
* Strong background/appreciation in observability principles, techniques and toolsets.
* Demonstrable knowledge of developing and managing RESTful API services written within a modern OO language such as Java or Python.
* Technical aptitude and passion for understanding complex distributed systems.
* Knowledge of languages such as PowerShell, C#
* Understand or worked within an Incident Management Process (ITSM)
Desirable Requirements:
* AWS
* Linux – Debian, CentOS, Alpine and AWS Linux
* Terraform, Docker, Kubernetes, Git
* Observability/APM Platforms
* Jenkins, Nginx, MySQL
* Networking, Security
* Certification in Site Reliability Engineering (SRE) Foundation or AWS Certified DevOps Engineer – Professional would be beneficial
Benefits
We are actively committed to promoting a fully diverse and inclusive workforce and we welcome applications for this role from all candidates who meet the key requirements.
Please do not hesitate to get in touch should you require any reasonable adjustments to assist with your application.
We are actively committed to promoting a fully diverse and inclusive workforce and we welcome applications for this role from all candidates who meet the key requirements. Please do not hesitate to get in touch should you require any reasonable adjustments to assist with your application.
Due to the regular onsite requirement for this role, it would be most suitable for UK based candidates. All applicants must already hold the Right to Work in the UK.
* 25 days annual leave, plus additional day for your birthday!
* Regular team incentives and social events, including annual Christmas and Summer parties
* Discounts with major cinemas and retailers, family days out, and much more
* Life Insurance
* Company Pension
* Employee Assistance Programme (Mental Health & Well-being support)
* Great culture and work environment