Social network you want to login/join with:
Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability.
As a Lead Site Reliability Engineer at JPMorgan Chase within the Risk Technology Team, you hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on technical and business issues. You will lead resiliency design reviews, break down complex problems into manageable tasks for engineers, act as a technical lead for medium to large-sized products, and provide mentorship to other engineers.
Job responsibilities
1. Demonstrate and champion site reliability culture and practices, exerting technical influence across your team.
2. Lead initiatives to improve the reliability and stability of applications and platforms, utilizing data-driven analytics to enhance service levels.
3. Collaborate with team members to define service level indicators, establish service level objectives, and set error budgets with stakeholders.
4. Maintain high technical expertise in one or more domains, proactively identifying and resolving technology bottlenecks.
5. Serve as the main contact during major incidents, demonstrating skills to quickly identify and resolve issues to prevent financial losses.
6. Document and share knowledge within the organization through internal forums and communities of practice.
Required qualifications, capabilities, and skills
1. Formal training or certification in reliability, scalability, performance, security, enterprise architecture, and toil reduction; proficient in advanced experience.
2. Fluency in at least one programming language such as Python, Java Spring Boot, or Unix Shell.
3. Deep knowledge of software applications and technical processes, with emerging expertise in one or more technical disciplines.
4. Proficiency in observability tools such as Grafana, Geneos, Dynatrace, Prometheus, Datadog, Splunk, including monitoring, SLO alerting, and telemetry collection.
5. Experience with CI/CD tools like Jenkins, GitLab, Terraform.
6. Experience with containerization and orchestration tools such as ECS, Kubernetes, Docker.
7. Experience troubleshooting networking technologies and issues.
8. Ability to analyze and solve problems involving complex data structures and algorithms.
9. Self-motivated to learn and evaluate new technologies, with the ability to teach programming languages to team members.
10. Ability to collaborate across different stakeholder levels and groups.
11. Working knowledge of Apache, Tomcat, and TomEE.
#J-18808-Ljbffr