Overview
We are seeking a skilled OpenShift Site Reliability Engineer (SRE) to join our team. In this role, you will be responsible for ensuring the reliability, availability, and performance of our OpenShift-based virtual/container platforms and services with a focus on automation. Work and collaborate across teams, such as Applications, Hardware, and Network. Develop secure service architecture using cloud-native technologies. Develop systems, primarily in Shell scripting, YAML, Ruby, Python and Go language, to prevent outages through automatic scanning and remediation. Establish and enforce SRE best practices through platform constraints and high-fidelity system modeling. Participate in an on-call rotation.
Responsibilities
* Ensure reliability, availability, and performance of OpenShift-based platforms and services with a focus on automation.
* Collaborate across teams such as Applications, Hardware, and Network.
* Develop secure service architecture using cloud-native technologies.
* Develop systems primarily in Shell scripting, YAML, Ruby, Python, and Go to prevent outages through automatic scanning and remediation.
* Establish and enforce SRE best practices through platform constraints and high-fidelity system modeling.
* Participate in an on-call rotation.
Required skills
* Hands-on experience with OpenShift virtualization and Kubernetes administration.
* Understanding of distributed systems and common distributed system failure domains; experience managing a production service with RedHat, Windows and ESXi.
* Strong knowledge of Linux systems and networking.
* Experience with monitoring, logging, alerting & observability tools (e.g., Otel, Prometheus, Grafana, Slunk, etc.).
* Proficiency in scripting languages Python, Shell, Go Lang, Terraform, etc.
* Familiarity with CI/CD tools (e.g., Jenkins, GitLab CI).
* Understanding of containerization (Docker) and microservices architecture.
* Ansible – Configuration Management and Deployment.
* Good problem-solving and communication skills.
Soft Skills
* Has experience and affinity to improve team performance.
* Mindsets and Behaviors/Self-mastery.
* Proven experience in Compute, OpenShift, Kubernetes, Hypervisors, Storage, Windows, Networks and Linux.
* Work with industry groups and vendors outside of the Client to establish and maintain their involvement and influence.
* Accountability for the control and compliance of the engineering process.
* Promote innovation and adoption of cutting-edge specialist technologies and practices within the domain.
* Promote development of engineers through coaching and mentoring.
* Consult as required in other areas to assist and provide a different perspective to programmed or projects that require it.
#J-18808-Ljbffr