At IBM Software, we transform client challenges into solutions. Building the world’s leading AI-powered, cloud-native products that shape the future of business and society. Our legacy of innovation creates endless opportunities for IBMers to learn, grow, and make an impact on a global scale. Working in Software means joining a team fueled by curiosity and collaboration. You’ll work with diverse technologies, partners, and industries to design, develop, and deliver solutions that power digital transformation. With a culture that values innovation, growth, and continuous learning, IBM Software places you at the heart of IBM’s product and technology landscape. Here, you’ll have the tools and opportunities to advance your career while creating software that changes the world. With Confluent, data doesn’t sit still. We put information in motion, streaming in near real time so organizations can react faster, build smarter, and deliver experiences as dynamic as the world around them.
Confluent Cloud processes millions of events per second across AWS, GCP, and Azure. When incidents happen in a multi-cloud streaming platform, they happen at scale—data in motion, exactly‑once semantics, and cascading failure modes that require deep systems thinking. We need an expert‑level engineer who can drive proactive reliability improvements that prevent these incidents before they occur.
This role combines hands‑on technical work with strategic program ownership. You’ll spend roughly 75 % of your time on engineering: building automation, improving tooling, analyzing systemic failure patterns, and designing reliability improvements. The remaining 25 % is teaching and coordination: coaching teams through post‑mortems, training incident commanders, and evolving our incident response practices.
You’ll be part of a global team with follow‑the‑sun coverage, with clean handoffs that keep everyone working sustainable hours. This role sits within Cloud Architecture and Reliability – Supportability, a horizontal team that owns reliability standards and tooling across engineering. You’re the person who makes us need incident management less.
Master's Degree