Role: GCP Data Architect
Location: Leeds/Halifax, UK (Hybrid)
Employment type: Contract
Role Summary
We are seeking an experienced Cloud Data Loading Architect to design, build,
and optimise automated pipelines that ingest structured, semistructured, and
unstructured datasets into Google Cloud Platform (GCP), specifically BigQuery.
This role will lead endtoend data ingestion design—from source discovery and
schema mapping, through transformation and data quality, to scalable, secure
loads into cloudnative analytical warehouses.
The ideal candidate combines strong cloud engineering skills with handson data
integration experience and a deep understanding of BigQuery performance
optimisation.
Key Responsibilities
· Design and implement highthroughput, faulttolerant ingestion pipelines for
batch and streaming data landing in BigQuery.
· Lead data ingestion architecture patterns using Cloud Storage (GCS),
Dataflow, Dataproc, Composer (Airflow), Pub/Sub, BigQuery Storage Write
API, and related services.
· Define data loading frameworks, mapping rules, schema evolution strategy,
and metadata management.
· Create reusable ingestion blueprints that ensure governance, lineage, and
auditability.
· Establish data quality checks, validation rules, reconciliation logic, and
SLAs.
· Optimise BigQuery cost, storage, partitioning, clustering, and access
patterns.
· Collaborate with security & platform teams to ensure IAM, service
accounts, VPCSC, and encryption policies are fully applied.
· Automate CI/CD deployments for ingestion pipelines using Cloud Build,
GitHub, GitLab or Jenkins.
· Produce detailed technical documentation and coach engineering squads
in cloud ingestion standards.
· Troubleshoot ingestion failures, performance bottlenecks, and
crossplatform data integration issues.
Top 10 Skillset & Qualities (Ideal Candidate)
1. Deep Expertise in Google Cloud Data Services
BigQuery, GCS, Dataflow (Apache Beam), Pub/Sub, Dataproc, Cloud Composer,
Storage Write API.
2. Data Ingestion Engineering Mastery
Handson experience designing frameworks to load data from APIs, files,
databases, event streams, and mainframe/legacy systems into cloud stores.
3. Strong SQL & BigQuery Optimisation Skills
Partitioning, clustering, materialised views, costefficient query design, columnar
storage understanding.
4. ETL/ELT Architecture Knowledge
Experience building transformation pipelines using Airflow, Dataflow, dbt, or
equivalent orchestration tools.
5. File & Format Proficiency
Ability to work with Parquet, Avro, ORC, JSON, CSV, nested/repeated structures,
and schema evolution.
6. Strong Python and/or Java Skills
Used to build Dataflow pipelines, ingestion utilities, automation scripts.
7. Cloud Security & Governance Awareness
IAM roles, leastprivilege models, VPCSC, service accounts, artifact signing, audit
logging.
8. DevOps & CI/CD Familiarity
Cloud Build, GitHub Actions, Terraform, Cloud Deployment Manager or Pulumi.
9. Data Quality & Observability Mindset
Experience implementing validation frameworks, anomaly detection,
reconciliation rules, logging/monitoring (e.g., Cloud Logging, Cloud Monitoring).
10. Excellent Architectural Communication Skills
Ability to document, diagram, and communicate ingestion patterns to
stakeholders at technical and nontechnical levels.