Job Description
Insight Global is seeking an Operations Site Reliability Engineer to provide global operational support for a leading infrastructure software company's customer-facing SaaS products. You will join a team of engineers demonstrating exceptional technical expertise, managing mission-critical infrastructure, and ensuring optimal availability (24x7x365), performance, and security.
This SRE role involves monitoring, maintaining, and enhancing the availability and performance of production services. Responsibilities include driving automation to minimize failures and manual tasks, supporting stakeholder requests within agreed SLAs, and managing maintenance activities, critical systems, and release planning for production applications.
Must haves:
* A degree in Systems Engineering, Computer Science, or related fields
* Professional experience in large cloud operations environments
* Experience administering Linux systems and working with various Linux distributions
* Operational experience with Amazon Web Services or Google Cloud Platform
* Proficiency with automation platforms to streamline repetitive tasks
* Strong scripting skills in Perl, shell, Ruby, BASH, or Python
* Familiarity with deployment tools such as Ansible Tower and Jenkins
* Experience executing large-scale deployments to global infrastructure
* Experience in system/application administration in high-availability, customer-facing, large-scale environments
#J-18808-Ljbffr