Job Description Job Description
There’s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.
Our team is globally located, focused on ensuring production stability, automation, reliability, and observability. We are looking for solution-oriented, commercially minded, customer-focused individuals used to working in an agile environment who want to be part of building something new from the ground up within a diverse and inclusive team.
Culture is important to us, and we seek intellectually curious, technology-passionate individuals eager to expand their skills while working on an exciting new venture. Your work will have a significant impact on our company, clients, and business partners worldwide.
As a Site Reliability Engineer III at JPMorgan Chase within the Corporate Technology - Market Risk, you will address complex business problems with simple solutions. Using code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their infrastructure, contributing to the end-to-end reliability, scalability, and availability of your platform.
Job Responsibilities Drive continuous improvement of reliability, monitoring, and alerting for mission-critical microservices.
Automate to reduce toil, creating reliable infrastructure and tooling to expedite feature development.
Develop and add metrics to microservices, define user-journeys, SLOs, error budgets, and set up dashboards and alerts.
Facilitate blameless post-mortems and ensure incident resolution.
Collaborate with development teams to develop software for reliability and scalability, designing self-healing and resiliency patterns.
Engage with various teams to influence application portfolio management.
Respond to incidents, providing support and insights alongside developers and engineers.
Design and implement deployment strategies using automated CI/CD pipelines.
Implement infrastructure, configuration, and network as code.
Understand SLIs and SLOs to proactively resolve issues, supporting SRE best practices.
Minimum Qualifications Formal training or certification in SRE concepts.
Proficiency in at least one programming language such as Python.
Experience with a technology stack involving software design, coding, testing, and delivery.
Experience with Kubernetes and cloud platforms like AWS.
Expertise in solving complex, mission-critical problems across domains.
Strong debugging and troubleshooting skills.
Ability to work collaboratively and proactively recognize obstacles.
Experience with CI/CD tools like Jenkins, GitLab, Terraform.
Experience with observability tools such as Dynatrace, Datadog, New Relic, CloudWatch, etc.
#J-18808-Ljbffr