Senior site reliability engineer

Belfast

TechNET IT Recruitment Ltd

Site reliability engineer

Posted: 13 November

Offer description

Senior Linux SRE

Outside IR35 - 12 month contract initially

Full remote role across UK / Europe

Our client is a consumer facing tech business and they are looking for a Senior SRE with a strong background in Linux infrastructure and third-party system operations. You’ll be responsible for running and optimising large-scale production environments (5,000+ hosts) built on technologies such as Kafka, Redis, Kubernetes and MySQL. This is a hands-on, systems-level position focused on reliability, scalability, performance and troubleshooting. You’ll work alongside experienced engineers, operating with a high degree of autonomy to keep critical systems healthy, resilient, and observable.

Key Responsibilities

* Manage, configure and maintain Linux systems (CentOS, Rocky, RHEL or similar distributions) in production environments
* Install, upgrade and troubleshoot third-party systems including Kafka, Redis, Kubernetes and MySQL
* Support day-to-day operations in data centre / large-scale infrastructure environments (5,000+ hosts)
* Contribute to system reliability, scalability and performance improvements across the platform
* Participate in an on-call rotation (one week every 4–5 weeks) to ensure 24x7 availability of critical systems
* Collaborate with internal teams to improve observability, monitoring and alerting across services
* Identify and implement operational improvements to existing monitoring, logging and incident response processes
* Use scripting and automation (primarily Bash and Python) to reduce toil and streamline recurring tasks
* Contribute to Infrastructure-as-Code practices using tools such as Ansible or Puppet

Required Experience & Skills

* 5+ years’ experience in Linux system administration, SRE, Infrastructure or Platform Engineering roles
* Proven experience operating large-scale infrastructure (thousands of hosts / distributed systems)
* Strong troubleshooting and performance tuning skills at the infrastructure and OS level
* Solid understanding of MySQL operations, including replication concepts
* Hands-on experience with Kafka and/or other distributed messaging systems
* Experience with Kubernetes or similar container orchestration platforms
* Practical scripting skills in Bash and/or Python for automation and tooling
* Familiarity with IaC tools such as Ansible or Puppet
* Good understanding of monitoring, alerting, logging and observability best practices
* Excellent communication skills and the ability to own incidents end-to-end, including post-incident reviews

Apply

Create E-mail Alert

Save

Similar job

Site reliability engineer ii

Belfast

CMETS CME Technology and Support Services Ltd.

Site reliability engineer

Similar job

Staff site reliability engineer

Belfast

CME Group

Site reliability engineer

€80,000 a year

Similar job

Site reliability engineer iii (tue - sat)

Belfast

CME Group Inc.

Site reliability engineer