We’re seeking an exceptional Senior Data Engineer to design, build, and own the data pipeline that powers our payments expert agent. Your role: take raw, messy domain content – payments handbooks, regulatory documents, scheme rulebooks, project delivery history – and turn it into a production-grade retrieval layer that an AI agent can reason over reliably. You’ll own this end-to-end, from ingestion through to the RAG interface, in a regulated, high-stakes environment.
Responsibilities
You will own the data pipeline that turns regulated payments domain content into a reliable retrieval layer for our AI agent:
* Design, build, and scale the ingestion and processing pipelines that move payments domain content – handbooks, scheme rulebooks, regulatory documents, project history – into structured, retrievable knowledge.
* Engineer robust Retrieval-Augmented Generation (RAG) pipelines including chunking, embedding, vector storage, and retrieval, tuned for dense regulatory and technical content.
* Stand up cloud-native data infrastructure on Azure – Data Factory, Functions, Blob/ADLS, CosmosDB, and AI Search – with Python as the primary language.
* Embed engineering rigour into the pipeline – CI/CD with GitHub Actions and Azure DevOps, containerisation, observability, data quality checks, and re‑runnable workflows.
* Collaborate closely with the AI Engineer, Data Architect, and payments subject matter experts to translate messy domain content into a retrieval layer the agent can reason over.
Candidate Profile
* 5+ years of professional data engineering experience, with a strong track record of designing and operating production data pipelines.
* Deep hands‑on experience building ingestion and processing pipelines for unstructured and semi‑structured content (documents, transcripts, structured data), including parsing, chunking, and metadata enrichment.
* Strong proficiency in Python.
* Hands‑on experience with embedding models, Retrieval-Augmented Generation (RAG) architectures, and vector search (e.g. Azure AI Search, pgvector, or equivalent).
* Strong working knowledge of the Azure data stack – Data Factory or Synapse, Functions, ADLS/Blob, CosmosDB, AI Search – and comfort with containerised workloads.
* Comfortable making architectural decisions on incomplete information, and willing to revise them as the pipeline meets real data.
* A ‘t‑shaped’ engineering mindset, and a willingness to stretch into adjacent work – DevOps, retrieval evaluation, prompt tuning, backend APIs – when the work needs it.
Bonus Qualifications
* Experience with document processing at scale – OCR, layout‑aware parsing, or table extraction from PDFs.
* Background in fintech, banking, payments, or compliance industries.
* Experience building evaluation tooling for retrieval quality (recall@k, faithfulness, regression tests on a curated eval set).
* Pipeline & processing core: Python, orchestration frameworks such as Azure Data Factory, Airflow, Dagster, etc.
* Retrieval-Augmented Generation (RAG) systems and Azure AI Search for vector and hybrid retrieval.
* Storage & data platform: Azure Blob/ADLS, CosmosDB, Azure AI Search, with FastAPI for the retrieval API layer.
* Ops & quality: GitHub Actions, Azure DevOps, Docker, with observability and data quality checks across the pipeline.
Benefits
* Up to 10% of annual earnings as a personal performance bonus
* Life insurance
* Group Income Protection
* Health Insurance for you and your family
* Dental Insurance for you and your family
* Pension: 4% employer and 4% employee; option to increase private pension contributions
* 28 days annual holiday plus public & bank holidays
* 7 days of sick leave paid 100% per annum
RedCompass Labs is committed to promoting and supporting a diverse and inclusive workplace, ensuring fair and equitable treatment for all.
#J-18808-Ljbffr