Overview
We’re hiring a Senior Site Reliability Engineer to drive reliability, scalability, and performance across large-scale, mission-critical systems. This is a hands-on role where you’ll apply software engineering, automation, and observability to build resilient infrastructure and embed SRE best practices across teams.
Responsibilities
* Build and operate reliable, scalable, and secure infrastructure platforms
* Lead incident management, advanced fault analysis, and blameless retrospectives
* Lead a small team
* Drive automation and infrastructure-as-code to improve efficiency
* Champion observability, data-driven reliability, and continuous improvement
* Collaborate across engineering, architecture, and product teams
* Influence technical strategy and mentor other engineers
Qualifications
* Proven experience with SRE practices and strong programming skills
* Deep systems knowledge (cloud, OS, networking, automation)
* Strong problem-solving, communication, and stakeholder influence skills
* Passion for learning, experimentation, and operational excellence
#J-18808-Ljbffr