SRE Observability Engineer
Key Responsibilities:
The Monitoring and Observability team is responsible for managing:
* Operating with a global footprint.
* Collaborating across various organizations within Client to understand and develop observability solutions for enterprise-wide deployment at scale.
* Managing the legacy monitoring stack across the Production Management organization within Client.
* Driving the strategic delivery of end-to-end Observability solutions in Client.
* Providing in-depth analysis with interpretive thinking to define problems and develop innovative solutions.
* Directly impacting the business by influencing strategic functional decisions through advice, counsel, or provided services.
* Persuading and influencing others through strong and comprehensive communication and diplomacy skills.
* Performing other duties and functions as assigned.
Essential Skills:
* OpenShift/Kubernetes Administration: Experience deploying, managing, and troubleshooting containerized applications on OpenShift/Kubernetes, including resource management and networking.
* Grafana & Observability Stack:
* Proficiency in administering Geneos ITRS at scale.
* Proficiency in administering Grafana (user management, data sources, dashboards, alerts).
* Working knowledge of Grafana backend components: Mimir (metrics), Loki (logs), and Tempo (traces).
* Experience with Prometheus for metric collection and PromQL for querying.
* Helm Chart Management: Experience with Helm for deploying applications, including creating, modifying, and managing Helm charts, library charts, and dependencies.
* Technical Documentation: Ability to create clear and concise documentation for systems and processes.
Desired Skills:
* Application Deployment: Ability to deploy applications using Lightspeed Enterprise.
* Google Cloud Operations: Experience with Google Cloud operations.
* Scripting & Automation: Experience with Bash or Python scripting for automating operational tasks