Responsibilities
* Observability & Monitoring Platform
* Design, implement, and own an Azure observability playbook, delivering comprehensive dashboards, alerting rules, and operational runbooks using Application Insights, Log Analytics, and Kusto Query Language (KQL).
* AIOps & Intelligent Automation
* Develop AI‑driven alerting and detection mechanisms to surface early‑warning signals, including IP reputation degradation, database capacity saturation, and anomalous traffic patterns, enabling proactive issue remediation before incidents occur.
* Release Assurance & Operational Readiness
* Define and enforce Operational Acceptance Testing (OAT) gates for all production deployments, ensuring releases meet agreed reliability, performance, security, and operability standards prior to go‑live.
* Platform Hygiene & Cost Governance
* Perform regular audits of the Azure tenant to identify and remove orphaned, unused, or misconfigured resources, reducing operational risk and controlling cloud spend.
* Infrastructure as Code & CI/CD Enablement
* Build and maintain reusable Terraform modules and ensure CI/CD pipeline integrity across GitHub Actions workflows, enabling consistent, repeatable infrastructure deployments within multi‑subscription hub‑and‑spoke network architectures.
The Ideal Candidate
* Experience
* 6+ years in DevOps, SRE, or Platform Engineering roles, with demonstrated experience operating and supporting high‑traffic, production Azure environments at scale.
* Infrastructure as Code Expertise
* Deep proficiency in Terraform, including module design, remote state management, workspace strategies, and multi‑environment deployment patterns.
* Monitoring & Observability Expertise
* Advanced experience with KQL for Azure Log Analytics, with the ability to design and build custom Azure Monitor Workbooks for operational insight and reporting.
* Security Automation
* Strong background in security automation, including passwordless authentication using OIDC, Azure Key Vault integration, and best practices for secrets management and identity‑driven access control.
* Azure Platform Knowledge
* In‑depth understanding of Azure Container Apps (ACA), virtual network integration, and Private Endpoint configuration to support secure, network‑isolated workloads.