Key Responsibilities:
Design, implement, and maintain data pipelines to ingest and process OpenShift telemetry (metrics, logs, traces) at scale.
Stream OpenShift telemetry via Kafka (producers, topics, schemas) and build resilient consumer services for transformation and enrichment.
Engineer data models and routing for multi-tenant observability; ensure lineage, quality, and SLAs across the stream layer.
Integrate processed telemetry into Splunk for visualization, dashboards, alerting, and analytics to achieve Observability Level 4 (proactive insights).
Implement schema management (Avro/Protobuf), governance, and versioning for telemetry events.
Build automated validation, replay, and backfill mechanisms for data reliability and recovery.
Instrument services with OpenTelemetry; standardize tracing, metrics, and structured logging across platforms.
Use LLMs to enhance observability capabilities (e.g., query assistance, anomaly summarization, runbook gen...