Lead site reliability engineer (snowflake/terraform/linux)

Cambridge

Partnerize

Site reliability engineer

€80,000 a year

Posted: 10h ago

Offer description

Job Summary

This position is an exciting opportunity to help Partnerize scale its entire platform portfolio across on‑prem datacentres and AWS, including our legacy systems, the BrandVerity, Ascend and recently acquired Konnecto. You will lead the development of an enterprise on‑prem containerisation solution to shift our engineering culture toward a "you build it, you own it" model and enable rapid, independent deployment for our teams.

The Team

You will manage and develop a diverse group of technical generalists, specialists and junior engineers, acting as a player/coach who mentors, up‑skills and guides career paths as we transition to a DevOps‑centric operating model.

Operational Reality

The role operates in a fast‑paced, high‑velocity environment where you will shape the architectural future of the business. You will apply modern incident‑management frameworks to troubleshoot and manage tickets, ensuring all issues across our estate are resolved decisively and efficiently.

As a Lead SRE, You Will:

* Strategic & Operational Management

o Developer Empowerment & Containerisation – collaborate on the design, build and rollout of a robust containerisation strategy (Kubernetes/Docker) so Engineering teams can own code from build to deployment.
o Reliability & Error Budgets – define Service Level Indicators (SLIs), set Service Level Objectives (SLOs) and manage error budgets to balance feature velocity and platform stability.
o Hybrid Platform Engineering & Konnecto – build software and systems to manage infrastructure on‑prem and AWS, lead integration and modernisation of Konnecto’s data ingestion and AI layers.
o FinOps / Cloud Cost Optimisation – monitor and optimise cloud spend across hybrid environments while ensuring high performance and cost effectiveness.
o CI/CD Pipeline Responsibility – continuously improve delivery pipelines to facilitate rapid engineering velocity.
* People Leadership & Talent Development

o Mentorship – deliver coaching sessions, act as a technical escalation point and foster knowledge sharing.
o Workload Management – scope incoming work, prioritise maintenance vs. project delivery and delegate tasks to ensure timely resolution.
* Security & Architecture

o Design & Threat Modelling – produce production‑grade application security designs and perform threat modelling.
o Security Strategy – drive security improvements through planning, vulnerability assessments and testing.
* Incident Management & Toil Reduction

o Toil Reduction – automate repetitive operational work, systematically engineering it out of existence.
* Post‑Mortems & Escalation

o Act as the ultimate escalation point for complex incidents, lead blameless post‑mortems, conduct root cause analysis and drive metrics such as MTTR.
* General Duties

o Consulting & Planning – participate in system design, platform management, and capacity planning.
o Escalation Support – serve as escalation point for complex incidents while maintaining a high level of quality.
o On‑Call – participate in the on‑call rotation.

Essential Knowledge, Skills and Experience

Core Competencies

* Technical Ability – a highly proficient SME capable of applying technical methods, leading cultural shifts such as DevOps adoption and developing skills in colleagues.
* Problem Solving & Decision Making – make quick, decisive decisions, weighing options and applying methodical, innovative problem‑solving.
* Communication & Influence – effectively communicate initiatives to all stakeholders and secure buy‑in for transformational projects.

Technical Competencies

* Cloud, Hybrid & Containerisation – essential knowledge of hybrid architectures, AWS and on‑prem environments, and extensive hands‑on experience with Docker, Kubernetes, Argo Workflows.
* Konnecto Tech Stack & Data Pipelines – experience with MongoDB, Snowflake, clickdata streams, S3 ingestion and Airflow ETL.
* Programming & Automation – proficiency in Python or Bash, deep understanding of GitHub, AI coding tools and practices, Terraform and Ansible.
* Security & Observability – experience managing DevOps security, observability stacks such as Prometheus, Grafana and Loki.
* Operations & Troubleshooting – exceptional Linux administration skills, incident management expertise and ability to diagnose and resolve issues independently.

Desirable Knowledge, Skills and Experience

* Innovation & Debt Management – interest in new technologies and refactoring technical debt.
* Legacy Databases – strong experience with MySQL, PostgreSQL, Redis.
* Data Streaming – experience with Kafka, Druid and other streaming/queuing technologies.
* Web & Storage – knowledge of Nginx and storage technologies like Gluster.

UK Benefits & Perks

* 25 days holiday + bank holidays
* Enhanced parental leave: 6 months full pay for birth parent; 4 weeks full pay for non‑birth parent after 1 year employment
* 5 extra 'Partnerize Parental Days' each year
* Private medical insurance via Vitality
* Enhanced pension contributions
* Cycle to Work scheme
* Eye care vouchers
* Life assurance
* Enhanced wellness program – access to EAP, Wellness Coaching & Wellness Fridays
* Regular company events and activities

Our Commitment to Diversity & Inclusion

Partnerize is an equal‑opportunity employer and is committed to attracting, developing and advancing outstanding team members regardless of race, ethnic identity, sexual orientation, religion, age, gender, gender identity, physical abilities or any other dimension of diversity. We foster an environment where individuals can be authentic, raise concerns and innovate without fear.

#J-18808-Ljbffr

Apply

Create E-mail Alert

Save

Similar job

Senior site reliability engineer

Saffron Walden

EMBL-EBI

Site reliability engineer

Similar job

Site reliability engineer

Cambridge

RedTech Recruitment Ltd

Site reliability engineer

Similar job

Senior site reliability engineer

Cambridge

European Bioinformatics Institute | EMBL-EBI

Site reliability engineer

€51,000 a year