Staff Site Reliability Engineer

Full-time

Job Family Group: Technology and Operations

Company Description

As the world's leader in digital payments technology, Visa's mission is to connect the world through the most creative, reliable and secure payment network - enabling individuals, businesses, and economies to thrive. Our advanced global processing network, VisaNet, provides secure and reliable payments around the world, and is capable of handling more than 65,000 transaction messages a second. The company's dedication to innovation drives the rapid growth of connected commerce on any device, and fuels the dream of a cashless future for everyone, everywhere. As the world moves from analog to digital, Visa is applying our brand, products, people, network and scale to reshape the future of commerce.

At Visa, your individuality fits right in. Working here gives you an opportunity to impact the world, invest in your career growth, and be part of an inclusive and diverse workplace. We are a global team of disruptors, trailblazers, innovators and risk-takers who are helping drive economic growth in even the most remote parts of the world, creatively moving the industry forward, and doing meaningful work that brings financial literacy and digital commerce to millions of unbanked and underserved consumers.

You're an Individual. We're the team for you. Together, let's transform the way the world pays.

Job Description

The Site Reliability Engineering (SRE) is a critical part of our Visa Cloud platform strategy. As an Site Reliability Engineer, you have a mindset to maximize system availability through both proactive and reactive means: you build robust technical support and automation to eliminate or minimize incidents, as well as investigate and resolve issues in response to live incidents. You are comfortable working with software engineering teams and supporting their demanding needs to ensure the security, availability and performance of the platform. You will join an established Cloudview - Site Reliability Engineering team.

Qualifications

Responsibilities:

As a Staff Site Reliability Engineer:

You will identify and support all site reliability request related to Visa Cloud Platform services (IaaS/PaaS/Container as a service)
You will lead/determine and develop architectural approaches, Infrastructure solutions to improve the availability, scalability, latency and efficiency of Visa Cloud Platform services
You will partner closely with software and systems engineers across the organization to ensure services/systems are highly stable and performant
Mentor other team members on managing end-to-end availability and performance of mission critical services while working on individual projects priorities, deadlines, and deliverables
Strong communication skills with a strong sense of urgency and attention to details

Basic Qualifications:

B.S. or higher in Computer Science or other technical discipline, or related practical experience
8+ years’ experience in Site Reliability or Production Engineering group for high availability/critical platforms/applications
Have strong hands on experience in Linux and Windows systems to patch and troubleshoot issues
Expert knowledge in CI/CD and hands on implementation experiences
Hands on experience with container related technologies like Docker, Kubernetes
Hands on experience on how to monitor software and Infrastructure and its related tools such as Prometheus, Grafana, Splunk and ELK
Live in terminal and ability to script/debug in Shell/PowerShell
Working knowledge of relational and non-relational databases, including creating and running queries [MySQL and NOSQL]
Working knowledge of web/middleware servers like Nginx and Tomcat
Experience with configuration management tools such as Chef/Ansible
Experience of working with ITIL disciplines (Event, Incident, Problem, & Change)
Have an urge to document all the things so you don't need to learn the same thing twice
Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it!!