Overview
Design and build Kafka-based streaming applications (Kafka Streams/ksqlDB) in Scala/Python for transformation, enrichment, and routing. Implement end-to-end streaming pipelines with strong data quality, idempotency, and DLQ patterns. Model topics, schemas, and contracts (Avro/Protobuf/JSON) and maintain backward/forward compatibility. Develop batch/stream interoperability with Spark/Structured Streaming for aggregation, feature generation, and storage in Parquet/ORC. Integrate processed data into analytics/observability platforms (e.g., Splunk) for dashboards, alerting, and proactive insights. Build automated validation, replay, and backfill mechanisms to ensure reliability and SLA adherence. Apply observability to the pipelines themselves (metrics, traces, structured logs) and tune performance/cost. Collaborate with platform/infra teams who handle Kafka administration while owning application-side streaming logic. Ensure security and compliance for application data paths (authn/z, encryption in transit/at rest, secret management). Document data flows, schemas, and runbooks for streaming services.
Responsibilities
* Design and implement Kafka-based streaming applications (Kafka Streams/ksqlDB) in Scala and/or Python for transformation, enrichment, and routing.
* Develop end-to-end streaming pipelines: producers, stream processors, and consumers with data quality, idempotency, and DLQ patterns.
* Model topics, schemas, and contracts (Avro/Protobuf/JSON) and maintain backward/forward compatibility.
* Develop batch/stream interoperability: Spark/Structured Streaming jobs for aggregation, feature generation, and storage in Parquet/ORC.
* Integrate processed data into analytics/observability platforms (e.g., Splunk) for dashboards, alerting, and proactive insights.
* Build automated validation, replay, and backfill mechanisms to ensure reliability and SLA adherence.
* Apply observability to the pipelines (metrics, traces, structured logs) and optimize performance and cost.
* Collaborate with platform/infra teams handling Kafka administration while focusing on application-side streaming logic.
* Ensure security and compliance for data paths (authn/z, encryption in transit/at rest, secret management).
* Document data flows, schemas, and runbooks for streaming services.
Qualifications
* Kafka application development: Kafka Streams/ksqlDB, producer/consumer patterns, partitioning/serialization, exactly-once/at-least-once semantics.
* Languages: Strong in Scala and/or Python for streaming apps; familiarity with testing frameworks and CI for stream processors.
* Schema management: Avro/Protobuf/JSON, schema registry usage, compatibility strategies.
* Stream/batch processing: Spark (including Structured Streaming), Parquet/ORC, partitioning/bucketing, performance tuning.
* Data quality and reliability: Idempotent processing, DLQs, replay/backfill, lineage, and SLA-aware designs.
* Observability: Metrics/tracing/logging for stream apps; integration with downstream dashboards/alerts.
* Security/compliance: AuthN/Z in clients, TLS/SASL usage, secret management in code/services.
* Collaboration: Work with Kafka platform/admin teams; strong communication and documentation.
#J-18808-Ljbffr