Overview
The SRE Team Lead will lead a mature Site Reliability Engineering function within the Platform Operations Team, working closely with Platform Support and Engineering teams. This role demands strong thought leadership, technical depth, and strategic direction for the discipline, with a particular emphasis on leveraging AI-driven operations (AIOps) and FinOps practices to optimise reliability, performance, and cloud spend. Although this is a hands-on technical role, the SRE Team Lead will also manage a small team of SRE, providing clear direction and ensuring consistent, data-driven, AI-enhanced service delivery across the platforms while working collaboratively with existing support and engineering groups.
Responsibilities
* Apply core SRE and DevOps principles - culture, automation, testing, measurement, and continuous improvement - to build and optimise pipelines focused on rapid, reliable software delivery. Integrate AIOps capabilities, such as automated anomaly detection and intelligent alerting, to further enhance operational excellence.
* Work with Solutions Architecture, Development, and QA teams to automate processes wherever possible, creating and improving stable CI/CD pipelines for both software and infrastructure. Develop tools that enable rapid provisioning of environments and resources across all teams, incorporating AI-assisted automation where beneficial.
* Use automation, observability, and monitoring tools to improve site reliability and proactively identify issues. Support development teams with troubleshooting, particularly in infrastructure, networking, and multi‑tier application design. Serve as a subject matter expert for cloud services—especially AWS PaaS—while applying FinOps practices to ensure cloud cost transparency, optimisation, and efficient resource usage.
* Create and maintain robust technical documentation for the infrastructure of the English platforms, including operational runbooks enhanced with predictive and AI-supported insights.
* Stay engaged with developments in the SRE, DevOps, AIOps, and FinOps communities, continually introducing new practices and technologies to improve reliability, performance, automation, and cloud cost efficiency.
* This position has been classified as a hybrid role, requiring the selected candidate to typically spend 40-60% of their time collaborating and connecting face-to-face at their dedicated location. Aside from our hybrid principles, other flexible working requests will be considered from the first day of employment, including other work arrangements should you require adjustments due to a disability or long-term health condition.
Qualifications
* Demonstrable passion for Site Reliability Engineering and drive to understand, anticipate, and counter platform-related issues before they become problems; continually stay up to date with the latest technological trends and developments.
* Great communication with the ability to collaborate across technical leadership and various business stakeholders, presenting ideas and strategies clearly and persuasively.
* Soft skills in motivating, inspiring, and leading a team (direct line management is not part of the role’s remit).
* Educated to degree level or equivalent with a minimum of 5 years proven experience in a systems administration or DevOps blended role.
* Experience implementing technologies such as Terraform, GitHub Actions, and containerization/orchestration (e.g., Kubernetes & Docker).
* Expertise in monitoring tools like New Relic, Grafana, Alert Manager, and site24x7.
* Extensive knowledge of cloud computing infrastructure, especially using Amazon Web Services (EKS, ECS, RDS, Route53, etc.).
* Excellent troubleshooting, debugging, communication, and documentation skills.
* Experience of working within an Agile product development environment.
About Cambridge University Press & Assessment
We are Cambridge University Press & Assessment, a world-leading academic publisher and assessment organisation and a proud part of the University of Cambridge. Joining us is your opportunity to pursue potential. You will belong to a collaborative team that is exploring new and better ways to serve students, teachers and researchers across the globe - for the benefit of individuals, society and the world. Sharing our mission will inspire your own growth, development and progress, in an environment which embraces difference, change and aspiration.
Cambridge University Press & Assessment is committed to being a place where anyone can enjoy a successful career, where it is safe to speak up, and where we learn continuously to improve together. We welcome applications from all candidates, regardless of demographic characteristics (age, disability, educational attainment, ethnicity, gender, marital status, neurodiversity, religion, sex, gender identity and sexual identity), cultural, or social class/background. We believe better outcomes come through diversity of thought, background and approach. We welcome applications from people from all backgrounds and communities, actively seeking to employ people from a wide range of different communities. If you are ready to take the next step in your Cambridge journey, we welcome your application. Together, we continue to shape a culture where everyone feels empowered to succeed and motivated to make a difference - for ourselves, for each other, and for learners worldwide.
Benefits
* 28 days annual leave plus bank holidays
* Private medical and Permanent Health Insurance
* Discretionary annual bonus
* Group personal pension scheme
* Life assurance up to 4 x annual salary
* Green travel schemes
#J-18808-Ljbffr