Role Title: Data Engineer- Kafka and Hadoop Expert(Python)
Location: Sheffield/ Hybrid 60% office-40% Remote
Duration: 30/11/2026
Key Responsibilities
* Design and build Kafka-based streaming applications (Kafka Streams/ksqlDB) in Scala/Python for transformation, enrichment, and routing.
* Implement end-to-end streaming pipelines: producers, stream processors, and consumers with strong data quality, idempotency, and DLQ patterns.
* Model topics, schemas, and contracts (Avro/Protobuf/JSON) and maintain backward/forward compatibility.
* Develop batch/stream interoperability: Spark/Structured Streaming jobs for aggregation, feature generation, and storage in Parquet/ORC.
* Integrate processed data into analytics/observability platforms (e.g., Splunk) for dashboards, alerting, and proactive insights.
* Build automated validation, replay, and backfill mechanisms to ensure reliability and SLA adherence.
* Apply observability to the pipelines themselves (metrics, traces, structured logs) and tune performance/cost.
* Collaborate with platform/infra teams who handle Kafka admin (brokers, security, ops) while owning application-side streaming logic.
* Ensure security and compliance for application data paths (authn/z, encryption in transit/at rest, secret management).
* Document data flows, schemas, and runbooks for streaming services.
Required Skills
* Kafka application development: Kafka Streams/ksqlDB, producer/consumer patterns, partitioning/serialization, exactly-once/at-least-once semantics.
* Languages: Strong in Scala and/or Python for streaming apps; familiarity with testing frameworks and CI for stream processors.
* Schema management: Avro/Protobuf/JSON, schema registry usage, compatibility strategies.
* Stream/batch processing: Spark (including Structured Streaming), Parquet/ORC, partitioning/bucketing, performance tuning.
* Data quality and reliability: Idempotent processing, DLQs, replay/backfill, lineage, and SLA-aware designs.
* Observability: Metrics/tracing/logging for stream apps; integration with downstream dashboards/alerts.
* Security/compliance: AuthN/Z in clients, TLS/SASL usage, secret management in code/services.
* Collaboration: Work closely with Kafka platform/admin teams while focusing on application-layer streaming logic; strong communication and documentation.\"\"\"