You will be part of a specialist engineering team responsible for designing, building, and optimising end-to-end financial instrument mastering pipelines. These pipelines span ingestion, normalisation, bi-temporal processing, and publication into enterprise data platforms.
You will work closely with data architects, domain experts, and QC engineers to deliver scalable, reliable, and high-performance data solutions across Azure and Microsoft Fabric ecosystems.
Key Responsibilities
* Build and maintain PySpark-based data pipelines for financial instrument mastering across multiple data sources
* Design and implement bi-temporal data processing models (system time + valid time) including Slice, Resolve, Coalesce, and Diff logic
* Develop optimised Azure Cosmos DB data models, including partitioning, indexing, change feed processing, and point-read optimisation
* Integrate external APIs for entity resolution and matching services (PermID / IAAS) with robust retry and batching mechanisms
* Design publication pipelines to convert bi-temporal data into uni-temporal outputs and publish via Microsoft Fabric / Parquet-based lakehouse architectures
* Implement data quality frameworks using Great Expectations to ensure accuracy and compliance
* Build robust unit and integration tests using PyTest for PySpark and Cosmos DB components
* Support and maintain CI/CD pipelines (GitLab CI) including Python packaging, Artifactory deployment, and ARM-based infrastructure provisioning
* Work with YAML-driven configuration for mastering rules, schemas, and environment setup
* Monitor and troubleshoot production pipelines using Eventstream telemetry, KQL, and DataDog observability tools
* Deliver scalable transformation logic, optimised aggregations, and high-performance data processing workflows
* Implement data governance controls including data masking, role-based access, and compliance policies
* Continuously tune and optimise workloads for performance, cost efficiency, and reliability
Required Skills & Experience
* Strong experience in Python and PySpark (Spark SQL, DataFrame API, Structured Streaming)
* Hands‑on experience building large‑scale ETL / streaming data pipelines
* Experience working with Azure Cosmos DB (NoSQL) including data modelling and performance tuning
* Strong knowledge of Azure Data Lake Storage (ADLS / OneLake / ABFS)
* Experience implementing bi-temporal or SCD Type 2 data models
* Strong understanding of data quality frameworks (e.g., Great Expectations)
* Experience with CI/CD pipelines (GitLab / Azure DevOps) and automated deployments
* Strong testing discipline using PyTest, mocking, and integration testing approaches
* Experience working with YAML/JSON configuration and infrastructure-as-code (ARM templates)
* Strong understanding of distributed data processing and Spark-based architectures
* Experience working with financial or time‑series datasets (market data, reference data, risk data preferred)
* Strong communication skills and ability to work with cross‑functional stakeholders
* Microsoft Fabric (Notebooks, Eventstream, Lakehouses, Spark Job Definitions)
* Entity resolution / matching systems and enrichment APIs
* Delta Lake and Change Data Feed (CDF)
* Cosmos DB performance optimisation (RU tuning, bulk operations, concurrency)
* Jinja2 templating or code generation approaches
* SonarQube or similar code quality tooling
* Monorepo development with modern Python packaging tools (uv / Hatchling)
* Knowledge of financial compliance standards (GDPR, SOX)
Python 3.11+, PySpark 3.5, Spark SQL
Microsoft Fabric (Eventstream, Notebooks, Lakehouse)
Why Join
* Work on a global financial markets transformation programme
* Hands-on with next‑generation Azure + Fabric data platforms
* Exposure to bi‑temporal modelling and financial instrument mastering systems
* High‑impact engineering role with modern cloud and streaming architecture
* Opportunity to work with leading domain and technical experts in a regulated environment
#J-18808-Ljbffr