Data & Analytics Engineer (Databricks Lakehouse)
This requirement calls for a self-motivated and result driven Data & Analytics Engineer to join our clint during an exciting time, as they embark on a transformational journey with our global data capabilities. You will own the design, build, and optimisation of reliable data pipelines and well-governed analytical datasets on the Databricks Lakehouse. You will acquire & transform raw data (bronze) into high quality, BI-ready models (gold), ensure data quality and lineage, and enable fast, secure analytics through SQL warehouses and Unity Catalogs. The ideal candidate will work closely with key business stakeholders and our SI partners to gather requirements, optimise business processes and deliver scalable solutions that drive business value.
Responsibilities
Data Engineering
* Design and implement batch/streaming pipelines (PySpark, Spark SQL, DLT, Auto Loader) across bronze, silver & gold layers.
* Implement CDC patterns, incremental loads, tune partitioning, Z Order, OPTIMIZE/VACUUM.
* Build robust tests (unit/integration), data expectations, and observability (alerts/metrics, runbooks).
* Orchestrate using Databricks Jobs.
* Adhere to naming conventions, tagging standards (owner, environment, cost centre, classification).
* Collaborate on CI/CD with Platform (Repos, Git, deployment pipelines, Terraform Jobs/Infrastructure as Code).
Analytics Engineering
* Model curated gold datasets, dimensional models, and semantic layers for BI (Tableau).
* Optimise SQL Warehouses for performance, concurrency, and cost; maintain versioned metric definitions.
* Document tables with catalog cards (owner, SLA, lineage, data contracts); enable discoverability.
Governance & Security
* Implement access controls with Unity Catalog
* Adhere to naming conventions, tagging standards (owner, environment, cost centre, classification).
* Participate in access reviews, audit remediation, and privacy/compliance processes.
Qualifications
Core Technical
* Databricks (Workspaces, Clusters/Policies, Pools, Jobs, SQL Warehouses, Repos)
* Spark / PySpark / Spark SQL with performance tuning (partitioning, caching, joins, shuffle optimisation).
* Delta Lake (ACID, time travel, OPTIMIZE, ZORDER, VACUUM, MERGE into for CDC).
* Delta Live Tables (DLT) and/or structured streaming for incremental/real time pipelines.
* Unity Catalog: permissions model, lineage, data discovery, RLS/CLS patterns.
* CI/CD: Git (branching/PR flow), build/test pipelines, environment promotion; familiarity with Terraform (Databricks provider).
* Data Quality: expectations, validation frameworks, test automation.
Required Skills
* Dimensional modelling, star/snowflake schemas, semantic versioning.
* SQL performance on analytical workloads; familiarity with Tableau (Direct Lake/DirectQuery) and other reporting tools.
Equal Opportunity Statement
We are committed to diversity and inclusivity in the workplace.