Salary: £75,000 - 75,000 per year Requirements:
* I require 3 years of experience in Site Reliability Engineering, DevOps, or equivalent roles.
* Strong skills in cloud-based infrastructure (Azure or AWS) using Infrastructure as Code (IaC) practices are essential.
* You should have hands-on experience building and managing CI/CD pipelines and developer tooling.
* A deep understanding of distributed systems and debugging complex technical issues is necessary.
* Proficiency in observability platforms like Datadog or similar is expected.
* I need knowledge of security principles and the integration of security into infrastructure design.
* Proven experience with event-driven architectures and building highly available (HA) and disaster recovery (DR) compliant systems is required.
* A strong grasp of software development standards and practices such as Test-Driven Development (TDD) and Behavior-Driven Development (BDD) is important.
* Excellent collaboration and communication skills with a proactive and positive attitude are essential.
* A Computer Science degree or equivalent experience is required.
* Certifications in Azure, AWS, or relevant platforms are a plus.
* An interest in AI and emerging technologies is welcome.
Responsibilities:
* You will drive system reliability, availability, and performance through engineering excellence.
* Designing and implementing monitoring, alerting, and observability tools using platforms like Datadog will be part of your role.
* Automating operational tasks using scripting, Infrastructure as Code (IaC), and configuration management tools will be your responsibility.
* You will troubleshoot incidents, lead root cause analysis, and work to improve Mean Time to Resolution (MTTR).
* Partnering with software engineers to integrate reliability best practices into the development lifecycle is expected.
* Building and maintaining CI/CD pipelines to streamline deployments and rollbacks will be a key focus.
* Ensuring infrastructure meets security and compliance standards will be essential.
* You will also optimize system resources for both performance and cost-effectiveness.
* Contributing to incident response and participating in on-call rotations will be part of your duties.
* Tracking and improving key SRE metrics such as error rates, incident count, and monitoring coverage is necessary.
Technologies:
* AI
* AWS
* Azure
* CI/CD
* Cloud
* Datadog
* DevOps
* Support
* Security
* TDD
More:
- We are looking for a talented and driven Site Reliability Engineer (SRE) to join our growing technology team. In this role, you will ensure the reliability, scalability, and performance of our digital platforms that support memorable customer experiences across the hospitality sector. You will work alongside our engineering, product, and infrastructure teams to build high-availability systems and automated operations that support the future of digital hospitality. Apply now to join a forward-thinking technology team where reliability, innovation, and customer impact go hand in hand.
last updated 37 week of 2025