Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Lead site reliability engineer (snowflake/terraform/linux)

Cardiff
Partnerize
Site reliability engineer
€80,000 a year
Posted: 8h ago
Offer description

Who We Are

At Partnerize, we're on a mission to transform the way businesses grow. We've built the leading partnership automation platform that empowers brands to discover, engage, and convert their audiences at scale. From affiliate marketing to influencer collaborations, we help our clients build and manage profitable partnerships that drive real results. We're a team of passionate problem‑solvers who are dedicated to helping our clients win in the ever‑evolving world of digital marketing.


Why Join Us

We're looking for passionate, talented people who want to be part of a winning team. At Partnerize, you'll find a culture of collaboration, innovation, and respect. We're guided by our core values, and we're committed to creating an environment where everyone can do their best work. We also offer a competitive salary, generous benefits, and a flexible work environment that allows you to thrive both personally and professionally. If you're ready to grow your career and make a difference, we'd love to hear from you.


Job Summary

This is a captivating and exciting time to join Partnerize. We are at a pivotal point in our tech progression, looking to significantly expand our technical estate, scale the platform, and replace existing legacy systems with modern solutions. You will play a vital role in the ongoing operationalisation and management of our entire platform portfolio found in our on‑prem datacentres and AWS cloud: Partnerize, BrandVerity, Ascend, and our recent acquisition, Konnecto. While there will be a key focus on integrating and supporting Konnecto's advanced data and AI layers, a critical pillar of your mission will be spearheading the development of our enterprise on‑prem containerisation solution. This initiative is designed to fundamentally shift our engineering culture towards a "you build it, you own it" model. By providing robust, automated container platforms, you will empower our Engineering teams to deploy quickly and independently, significantly reducing the bottleneck created by relying solely on the TechOps department.

We are looking for a Lead SRE who is both a deep technical expert and a capable mentor. In this role, your primary responsibility is ensuring our diverse, hybrid systems remain available, scalable, and secure. You will act as an authoritative Subject Matter Expert (SME), championing developer autonomy, driving IT systems security policies, and working closely with the security compliance team to protect our platforms from threats while driving continuous integration and delivery. This role will report into the SRE and Application Manager providing them with technical guidance and recommendations while being the technical lead for the SRE team.


The Team

You will be responsible for ensuring the continuous development and progression of team members. We are looking for a player/coach who can mentor, empower, and up‑skill talent. We have a mix of technical generalists, specialists, and junior engineers; you will help identify their strengths and constructively develop areas of weakness, guiding their technological career paths as we transition to a DevOps‑centric operating model.


The Operational Reality

You will operate in a fast‑paced, high‑velocity environment where your work directly and visibly shapes the company's architectural future. This requires a highly adaptable and pragmatic leader who can balance strategic project delivery with hybrid‑estate maintenance. By applying modern incident management frameworks to troubleshooting and ticket management, you are responsible for ensuring all issues across our estate are addressed decisively and efficiently.


As a Lead SRE, You Will:

* Strategic & Operational Management
o Developer Empowerment & Containerisation
+ Collaborate on the design, build, and rollout of a robust containerisation strategy (Kubernetes/Docker). Your goal is to assist in delivering a platform that enables Engineering teams to take full ownership of their code from build to deployment.
o Reliability & Error Budgets
+ Define Service Level Indicators (SLIs), set Service Level Objectives (SLOs), and manage Error Budgets to balance feature velocity with platform stability.
o Hybrid Platform Engineering & Konnecto
+ Build software and systems to manage platform infrastructure across on‑prem and AWS. Take the lead technical role in integrating and modernising Konnecto's architecture, ensuring its data ingestion and AI logic layers scale securely.
o FinOps / Cloud Cost Optimisation
+ Manage, monitor, and optimise cloud infrastructure spend across our hybrid environments, ensuring architectural decisions are both highly performant and cost‑effective.
o CI/CD Pipeline Responsibility
+ Responsible for the continuous improvement, delivery, and integration pipelines to facilitate rapid engineering velocity.
* People Leadership & Talent Development
o Mentorship
+ Deliver coaching sessions to the team and individuals, acting as a technical escalation point and fostering a culture of knowledge sharing.
o Workload Management
+ Scope the work coming into the SRE team, prioritise hybrid‑estate maintenance vs. project delivery, and delegate tasks to team members to ensure prompt resolution.
* Security & Architecture
o Design & Threat Modelling
+ Produce production‑grade application security designs. Perform design reviews and threat modelling of our services and products.
o Security Strategy
+ Drive improvements to Partnerize platforms' security through strategic planning, vulnerability assessments, and security testing.
* Incident Management & Toil Reduction
o Toil Reduction Champion. Reduce automation, continually identifying manual, repetitive operational work and engineering it out of existence.
* Post‑Mortems & Escalation
o Act as the ultimate escalation point for complex support incidents, participate in the On‑Call rotation, lead blameless post‑mortems, conduct Root Cause Analysis (RCA), and aggressively track metrics like Mean Time To Recovery (MTTR).
* General Duties
o Consulting & Planning
+ Participate in system design consulting, platform management, and capacity planning.
o Escalation Support
+ Act as the ultimate escalation point for complex support incidents and assignments while maintaining a high level of quality.
o On‑Call
+ Participate in the On‑Call Rotation.


Essential Knowledge, Skills and Experience


Core Competencies

* Technical Ability
o Highly proficient SME capable of reliably applying technical methods, leading cultural technical shifts (e.g., DevOps adoption), and supporting the development of new skills in colleagues.
* Problem Solving & Decision Making
o Capable of making decisions quickly and decisively, weighing options, and approaching problems methodically and innovatively.
* Communication & Influence
o Effectively communicates initiatives to all stakeholders and is capable of procuring buy‑in for key transformational projects (like containerisation rollouts).


Technical Competencies

* Cloud, Hybrid & Containerisation
o Essential knowledge of hybrid architectures, managing both AWS and on‑premise environments. Extensive hands‑on experience designing and managing advanced containerisation environments using Docker, Kubernetes, and Argo Workflows to enable developer self‑service.
* Konnecto Tech Stack & Data Pipelines
o Proven experience managing modern storage layers and databases, specifically MongoDB and Snowflake. Experience supporting complex data ingestion layers, including clickdata streams, S3 raw/parsed ingestion, and Airflow ETL.
* Programming & Automation
o Experience in automation languages (Python or Bash).
o Deep understanding of GitHub and experience implementing or working alongside AI coding tools and practices.
o Knowledge of Infrastructure as Code (Terraform, Ansible).
* Security & Observability
o Experience with security in a DevOps environment. Experience managing observability stacks (e.g., Prometheus, Grafana, and Loki).
* Operations & Troubleshooting
o Exceptional Linux system administration skills. Highly proficient in troubleshooting, diagnosing, and independently solving issues using modern incident management frameworks.


Desirable Knowledge, Skills and Experience

* Innovation & Debt Management
o A keen interest in new technologies, specifically supporting development teams in the refactoring of technical debt.
* Legacy Databases
o Strong experience with relational databases (MySQL, PostgreSQL, Redis).
* Data Streaming
o Experience with data streaming and queuing technologies, specifically Apache Kafka and Druid.
* Web & Storage
o Knowledge of Nginx (or other web server technologies) and storage technologies like Gluster.


UK Benefits & Perks

* 25 days holiday in addition to bank holidays
* Enhanced Parental Leave: 6 months full pay for birth parent, 4 weeks non‑birth parent at full pay after one year employment
* 5 extra 'Partnerize Parental Days' each year
* Private Medical Insurance through Vitality
* Enhanced pension contributions
* Cycle to Work scheme
* Eye Care Vouchers
* Life Assurance
* Enhanced Wellness Program including access to EAP, Wellness Coaching & Wellness Fridays program
* Regular company events and activities


Our Commitment to Diversity & Inclusion

We are committed to attracting, developing, and advancing our outstanding team members, regardless of race, ethnic identity, sexual orientation, religion, age, gender, gender identity, physical abilities, or any other dimension of diversity. We strive to foster an environment where people can be their authentic selves, raise concerns and innovate, all without fear; where they are treated fairly and respectfully, have equal access to opportunities and resources and can contribute fully to the organisation’s success. Every individual in our business is expected to live this commitment without exception.

#J-18808-Ljbffr

Apply
Create E-mail Alert
Job alert activated
Saved
Save
Similar job
Site reliability engineer (security cleared)
Cardiff
Profile 29
Site reliability engineer
£65,000 a year
Similar job
Site reliability engineer (security cleared)
Cardiff
Profile 29
Site reliability engineer
Similar job
Site reliability engineer (security cleared)
Newport (Newport)
Profile 29
Site reliability engineer
See more jobs
Similar jobs
Engineering jobs in Cardiff
jobs Cardiff
jobs Cardiff
jobs Wales
Home > Jobs > Engineering jobs > Site reliability engineer jobs > Site reliability engineer jobs in Cardiff > Lead Site Reliability Engineer (Snowflake/Terraform/Linux)

About Jobijoba

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by Job Title
  • Jobs by Industry
  • Jobs by Company
  • Jobs by Location
  • Jobs by Keywords

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2026 Jobijoba - All Rights Reserved

Apply
Create E-mail Alert
Job alert activated
Saved
Save