Overview
Visa is a world leader in payments technology, facilitating transactions between consumers, merchants, financial institutions and government entities across more than 200 countries and territories, dedicated to uplifting everyone, everywhere by being the best way to pay and be paid.
At Visa, you'll have the opportunity to create impact at scale — tackling meaningful challenges, growing your skills and seeing your contributions impact lives around the world. Join Visa and do work that matters — to you, to your community, and to the world.
Progress starts with you.
Job Description
Site Reliability Engineering (SRE) is essential to Visa’s Cloud platform strategy. In this role, you’ll ensure our development platform and tools let engineers focus on innovation instead of infrastructure. You’ll promote observability best practices and automate resolution of recurring issues, working closely with software engineering teams to support security, availability, and performance. Responsibilities include triaging issues, collaborating on infrastructure management, and setting up monitoring for full coverage. Hands-on expertise is required, especially with major DevTools like GitHub, Jenkins, Jira, and Artifactory.
We seek a Software Engineer + SRE hybrid engineer. The ideal candidate deeply understands at least one major DevTool, quickly resolves tool-related issues in collaboration with developers, and applies systems thinking to maintain reliable applications and infrastructure while improving developer productivity.
Responsibilities
* Tools Support: You will be the primary point of contact for developers using tools like GitHub, Jenkins, Jira, or Artifactory.
* Troubleshoot and resolve tool-related issues promptly to minimize developer downtime.
* Maintain and optimize CI/CD pipelines and integrations for reliability and scalability.
* Collaborate with development teams to improve workflows and automation.
* Site Reliability Engineering: Design, implement, and maintain systems for high availability, scalability, and performance.
* Monitor and improve application reliability through proactive measures and incident response.
* Develop and maintain observability solutions (metrics, logging, tracing).
* Participate in on-call rotations and drive root cause analysis for incidents.
* Collaboration & Continuous Improvement: Partner with engineering teams to identify reliability risks and implement best practices.
* Document processes, troubleshooting guides, and reliability playbooks.
* Advocate for automation and self-service solutions to reduce operational overhead.
Qualifications
* Required Skills: ~3 years in SRE and/or DevTools support roles.
* Proficiency managing at least one DevTool (GitHub, Jenkins, ArgoCD, Jira, Artifactory, Confluence).
* Strong understanding of CI/CD principles and pipelines.
* Solid knowledge of Linux systems, networking, and containerization (Docker/Kubernetes).
* Hands-on experience with cloud platforms.
* Programming/Scripting: Proficiency in Python, Go, Java, JavaScript, Ansible, GitHub Actions or similar.
* Mindset: Strong problem-solving skills, systems thinking, self-starter, and a passion for reliability.
* Basic Qualifications: Bachelor’s degree in IT, CS or related field and/or 3+ years working experience in IT Operations and Delivery.
* Experience in 2+ of the following: Python, Java, Go, PowerShell, JavaScript, Terraform, Ansible, Helm, Chef, CloudFormation.
* Basic understanding of YAML, JSON, HTML, XML.
* Hands-on experience in Linux and/or Windows systems and understanding of distributed computing environments.
* 2 years experience with Kubernetes workload management and GitOps.
* 2 years experience with CI/CD tooling (Jenkins, GitHub, ArgoCD, Artifactory) in a large-scale environment.
* 2 years experience with observability tooling (Grafana, Prometheus, Splunk, Datadog, New Relic, DynaTrace, Sentry) in a large-scale environment.
* 2 years experience supporting relational and non-relational databases (MySQL, MongoDB, PostgreSQL, etc.), including queries, performance, and scaling.
* Experience managing container infrastructure and enabling a container-first model.
* This role requires on-call support as the team provides 24/7 operational support.
* Preferred Qualifications: 2+ years working in Platform, SRE or Production Engineering for high availability/critical platforms.
* Experience managing a distributed container platform including deployment/release management, provisioning, capacity management, and workload management.
Additional Information
Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.
#J-18808-Ljbffr