Job Description
We are seeking a skilled Senior Site Reliability & Platform Engineer to join our team.
Responsibilities
* Architect and operate scalable, secure infrastructure in Azure services such as AKS, Functions, SQL Database, and Cosmos DB.
* Apply SRE best practices including SLOs, alerting, observability, and incident management.
* Build and maintain Infrastructure as Code using Terraform (v1.7+) and GitOps workflows.
* Enhance CI/CD pipelines with security scanning, automation, and progressive delivery.
* Develop reusable tooling, modules, and self-service capabilities for development teams.
* Improve monitoring across systems using tools like Datadog, Grafana, ELK, and synthetic checks.
* Participate in the on-call rota and lead incident response and post mortems.
* Promote FinOps best practices and cost optimisation strategies.
* Contribute to internal platform governance, documentation, and coaching.
Experience Required
* Extensive experience with Azure services including AKS, Azure Functions, SQL Database, and Cosmos DB.
* Proficient in Infrastructure as Code using Terraform (v1.7+).
* Skilled in building and maintaining CI/CD pipelines, GitOps workflows, and automation scripting.
* Strong grasp of networking principles including TCP/IP, load balancing, DNS, and routing.
* Well-versed in DevSecOps methodologies including security scanning, IAM, and RBAC implementation.
* Hands-on experience with FinOps practices such as resource tagging, budgeting, and cost optimization.
* Proficient in managing both Windows and Linux operating systems.
* Confident in supporting production environments, participating in on-call rotations, and collaborating across teams.
What You'll Get
* A competitive salary.
* Hybrid / flexible working arrangement.
* Generous holiday entitlement.
* Personal development budget.
* A 35 hour working week.
This is a key role within a modern, cloud native environment where you will combine SRE principles with cutting-edge platform engineering. You will play a critical part in enabling product teams to ship faster, more securely, and cost effectively - while owning the health and evolution of the platform itself.