Role : Site Reliability Engineer
Location: London, UK (Hybrid)
Contract: 06-12months with possible extensions
Essential Skills:
OpenShift/Kubernetes Administration: Experience deploying, managing, and troubleshooting containerized applications on OpenShift/Kubernetes, including resource management and networking.
Grafana & Observability Stack:
Proficiency in administering Geneos ITRS at scale.
Proficiency in administering Grafana (user management, data sources, dashboards, alerts).
Working knowledge of Grafana backend components: Mimir (metrics), Loki (logs), and Tempo (traces).
Experience with Prometheus for metric collection and PromQL for querying.
Helm Chart Management: Experience with Helm for deploying applications, including creating, modifying, and managing Helm charts, library charts, and dependencies.
Technical Documentation: Ability to create clear and concise documentation for systems and processes.
Desired Skills:
Application Deployment: Ability to deploy applications using Lightspeed Enterprise.
Google Cloud Operations: Experience with Google Cloud operations.
Scripting & Automation: Experience with Bash or Python scripting for automating operational tasks.