Sre

Belfast

Permanent

Ocho People

€70,000 a year

Posted: 19 January

Offer description

Site Reliability Engineer

We're working with a global technology consultancy that designs, builds, and supports modern software platforms for enterprise customers worldwide. They partner closely with clients to deliver reliable, scalable, cloud-native solutions.

The Role

As an SRE, you'll play a key role in ensuring the availability, performance, and scalability of production systems, supporting customers across the EMEA region. Helping to build, mature, and enhance the SRE function. This is a hands‑on, technical role, focused on reliability, automation, and operational excellence across a distributed, cloud-based platform.

Key Responsibilities

* Platform Reliability: Deploy, operate, and improve Kubernetes clusters across multiple cloud environments.
* Service Performance: Design and implement processes to enhance system reliability, availability, and scalability.
* CI/CD Enablement: Build and optimise CI/CD pipelines to support safe, repeatable deployments.
* Observability & Incidents: Own monitoring, alerting, and incident response to minimise downtime and speed recovery.
* Root Cause Analysis: Lead post‑incident reviews and implement long‑term preventative improvements.
* Automation: Reduce operational toil through automation and performance optimisation.
* On‑Call: Participate in weekday coverage and a once‑monthly weekend rota.

Collaboration & Stakeholder Engagement

* Work closely with engineering, infrastructure, and product teams to embed SRE best practices.
* Advocate for reliability, resilience, and operational excellence across teams.
* Collaborate with a globally distributed engineering function.
* Engage directly with customers to resolve incidents and improve user experience.

Skills & Experience

* Proven experience as an SRE or similar role, supporting complex distributed systems (5+ years).
* Strong Kubernetes experience (AKS, EKS, GKE, or similar).
* Hands‑on with observability tools such as Prometheus, Grafana, Kibana, Vector, or Superset.
* Experience with at least one major cloud platform: AWS, Azure, GCP, or Linode.
* SQL database experience (PostgreSQL beneficial but not essential).
* Proficiency in Python, Go, or Rust.
* Strong Linux expertise, including performance tuning and troubleshooting.
* Excellent communication skills, able to work effectively with engineers and customers.

Please apply now if you are meeting the above criteria, or contact Andrew Harrison directly.

#J-18808-Ljbffr

Apply

Create E-mail Alert

Save

See more jobs