Halian Technology looking for a talented and driven Site Reliability Engineer (SRE) to join our growing technology team. In this role, youll ensure the reliability, scalability, and performance of our digital platforms that support memorable customer experiences across the hospitality sector. Youll work alongside our engineering, product, and infrastructure teams to build high-availability systems and automated operations that support the future of digital hospitality.
Key Responsibilities:
1. Drive system reliability, availability, and performance through engineering excellence.
2. Design and implement monitoring, alerting, and observability tools using platforms like Datadog.
3. Automate operational tasks using scripting, Infrastructure as Code (IaC), and configuration management tools.
4. Troubleshoot incidents, lead root cause analysis, and improve Mean Time to Resolution (MTTR).
5. Partner with software engineers to integrate reliability best practices into the development lifecycle.
6. Build and maintain CI/CD pipelines to streamline deployments and rollbacks.
7. Ensure infrastructure meets security and compliance standards.
8. Optimise system resources for both performance and cost-effectiveness.
9. Contribute to incident response and participate in on-call rotations.
10. Track and improve key SRE metrics such as error rates, incident coun...