There’s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission‑critical systems.
As a Site Reliability Engineer III at JPMorgan Chase within the Chief Technology Office, you will solve complex and broad business problems with simple and straightforward solutions. We are seeking a Site Reliability Engineer (SRE) to help drive reliable, scalable, and intelligent platform operations in a global financial environment. This role combines technical support, DevOps practices, and SRE principles—including on‑call incident response, automation, and a customer‑first mindset. You will work with modern tools to ensure our applications and services remain robust and available.
Job Responsibilities
* Collaborate with engineering, support, and operations teams to maintain and improve the reliability of mission‑critical applications.
* Participate in incident management, troubleshooting, and continuous improvement.
* Help implement automation and monitoring solutions.
* Be part of an on‑call rotation, requiring effective action during production incidents.
* Share knowledge, follow best practices, and contribute to a culture of learning and innovation.
* Communicate clearly, solve problems proactively, and focus on customer needs.
Required qualifications, capabilities and skills
* Formal training or certification on SRE & Application Support concepts and proficient applied experience
* SRE & Application Support: Experience in SRE, DevOps, or application support roles, with knowledge of SLIs/SLOs, incident response, and troubleshooting.
* Observability & Monitoring: Familiarity with monitoring and observability tools (e.g., Grafana, Prometheus, Splunk, Open Telemetry).
* DevOps Tooling: Hands‑on experience with CI/CD pipelines (Jenkins, including global libraries), infrastructure as code (Terraform), version control (Git), containerization (Docker), and orchestration (Kubernetes).
* Cloud & Automation: Exposure to cloud platforms (AWS, GCP, or Azure) and automating infrastructure and deployments.
* On‑Call & Incident Management: Willingness to participate in on‑call rotation and respond to production incidents.
* Problem Solving & Communication: Ability to break down issues, document solutions, and communicate effectively with team members and customers.
Preferred qualifications, capabilities and skills
* Financial/Regulated Experience: Experience in banking, fintech, or regulated environments.
* Resilience Engineering: Participation in game days or chaos engineering.
* Mentorship: Interest in sharing knowledge and best practices with peers.
#J-18808-Ljbffr