MY client are transforming observability with a modern, full-stack platform that delivers logs, metrics, traces, and security monitoring — cutting costs by up to 70% while boosting efficiency.
They are looking for a Lead SRE to own and elevate our Alerting & Incident Management platform. You’ll be the driving force behind reliability, customer satisfaction, and product excellence — ensuring smooth alert management, fewer engineering interruptions, and a best-in-class incident response experience.
This role blends technical depth, customer impact, and product strategy — perfect for someone who thrives at the intersection of engineering, incident response, and product innovation.
What You’ll Do
* Champion customer experience by speeding up alert resolution and reducing interruptions for engineers.
* Build solutions to common pain points, shaping roadmaps, documentation, and technical knowledge.
* Develop benchmarking tools to improve performance, reliability, and scalability.
* Stay ahead of incident management trends to drive new workflows and product improvements.
* Mentor teams and lead with clear, impactful communication.
What We’re Looking For
* 5+ years in software engineering, DevTools, or infrastructure.
* Strong expertise in incident management, alert routing, and large-scale orchestration.
* SaaS or incident management platform experience (PagerDuty, OpsGenie, etc. a plus).
* Solid technical foundation with cloud/distributed systems.
* Excellent communicator, comfortable working across US/IL time zones.
* Bonus: leadership experience, SRE/DevOps background, knowledge of SLO/SLA practices.