Senior Linux SRE
Outside IR35 - 12 month contract initially
Full remote role across UK / Europe
Our client is a consumer facing tech business and they are looking for a Senior SRE with a strong background in Linux infrastructure and third-party system operations. You’ll be responsible for running and optimising large-scale production environments (5,000+ hosts) built on technologies such as Kafka, Redis, Kubernetes and MySQL. This is a hands-on, systems-level position focused on reliability, scalability, performance and troubleshooting. You’ll work alongside experienced engineers, operating with a high degree of autonomy to keep critical systems healthy, resilient, and observable.
Key Responsibilities
* Manage, configure and maintain Linux systems (CentOS, Rocky, RHEL or similar distributions) in production environments
* Install, upgrade and troubleshoot third-party systems including Kafka, Redis, Kubernetes and MySQL
* Support day-to-day operations in data centre / large-scale infrastructure environments (5,000+ hosts)
* Contribute to system reliability, scalability and performance improvements across the platform
* Participate in an on-call rotation (one week every 4–5 weeks) to ensure 24x7 availability of critical systems
* Collaborate with internal teams to improve observability, monitoring and alerting across services
* Identify and implement operational improvements to existing monitoring, logging and incident response processes
* Use scripting and automation (primarily Bash and Python) to reduce toil and streamline recurring tasks
* Contribute to Infrastructure-as-Code practices using tools such as Ansible or Puppet
Required Experience & Skills
* 5+ years’ experience in Linux system administration, SRE, Infrastructure or Platform Engineering roles
* Proven experience operating large-scale infrastructure (thousands of hosts / distributed systems)
* Strong troubleshooting and performance tuning skills at the infrastructure and OS level
* Solid understanding of MySQL operations, including replication concepts
* Hands-on experience with Kafka and/or other distributed messaging systems
* Experience with Kubernetes or similar container orchestration platforms
* Practical scripting skills in Bash and/or Python for automation and tooling
* Familiarity with IaC tools such as Ansible or Puppet
* Good understanding of monitoring, alerting, logging and observability best practices
* Excellent communication skills and the ability to own incidents end-to-end, including post-incident reviews