Site Reliability Engineer, Azure, GCP, Automation
A key customer of ours is seeking several SRE candidates to help with this massive build out and implementation across the GCP/Azure platforms. You’ll work within a multidisciplinary engineering squad, supporting the delivery, operation and continuous improvement of our cloud‑hosted services.
Responsibilities
* Support the reliability and performance of the cloud platforms your squad owns.
* Use observability tools, metrics, logs and traces to detect and prevent issues.
* Contribute to incident response, post‑incident reviews and problem management activities.
* Build automation that removes toil and improves operational efficiency.
* Work collaboratively with engineers, Product Owners and platform teams to balance delivery with operational health.
* Improve SLOs, error budgets and other product health measures.
* Take part in engineering ceremonies, knowledge sharing and squad‑wide improvement initiatives.
Technical Skills
* Experience with Azure and/or GCP public cloud platforms.
* Understanding of observability (metrics, logs, traces) and its impact on system health.
* Experience with GitHub pipelines and Terraform modules.
* Exposure to SRE principles such as SLOs, SLIs and error budgets.
* Ability to contribute to automation using Python, PowerShell, Terraform, CI/CD, or similar tools.
* Solid knowledge of modern engineering practices including DevOps, Infrastructure as Code and automation.
McGregor Boyall is an equal opportunity employer and do not discriminate on any grounds.
#J-18808-Ljbffr