Requirements
Must have:
- Hands-on experience building streaming data pipelines with Kafka (producers/consumers, schema registry, Kafka Connect/KSQL/KStream) - Proficiency with OpenShift/Kubernetes telemetry (OpenTelemetry, Prometheus) and CLI tooling - Experience integrating telemetry into Splunk (HEC, UF, sourcetypes, CIM), building dashboards and alerting - Strong data engineering skills in Python (or similar) for ETL/ELT, enrichment, and validation - Knowledge of event schemas (Avro/Protobuf/JSON), contracts, and backward/forward compatibility - Familiarity with observability standards and practices; ability to drive toward Level 4 maturity (proactive monitoring, automated insights) - Understanding of hybrid cloud and multi-cluster telemetry patterns - Knowledge of security and compliance for data pipelines: secret management, RBAC, encryption in transit/at rest - Good problem-solving skills and ability to work in a collaborative team environment - Strong communication and documentation skills
Responsibilities:
- Design, implement, and maintain data pipelines to ingest and process OpenShift telemetry (metrics, logs, traces) at scale - Stream OpenShift telemetry via Kafka (producers, topics, schemas) and build resilient consumer services for transformation and enrichment - Engineer data models and routing for multi-tenant observability; ensure lineage, quality, and SLAs across the stream layer - Integrate processed telemetry into Splunk for visualization, dashboards, alerting, and analytics to achieve Observability Level 4 (proactive insights) - Implement schema management (Avro/Protobuf), governance, and versioning for telemetry events - Build automated validation, replay, and backfill mechanisms for data reliability and recovery - Instrument services with OpenTelemetry; standardize tracing, metrics, and structured logging across platforms - Use LLMs to enhance observability capabilities (e.g., query assistance, anomaly summarization, runbook generation) - Collaborate with platform, SRE, and application teams to integrate telemetry, alerts, and SLOs - Ensure security, compliance, and best practices for data pipelines and observability platforms - Document data flows, schemas, dashboards, and operational runbooks
Company:
We are a technology-forward company committed to enhancing our observability capabilities and enabling our teams to deliver high-quality products. Our focus is on building and maintaining scalable data pipelines that ensure reliability and compliance in processing telemetry data. As part of our dynamic team, you will have the opportunity to work alongside skilled platform, SRE, and application teams in a collaborative environment, with a strong emphasis on security and best practices. We offer competitive benefits and a supportive work culture, allowing for professional development and career growth.