Jobs
My ads
My job alerts
Sign in
Find a job Career Tips Companies
Find

Lead data acquisition engineer - uk commercial energy platform

London
APEXION
Commercial
Posted: 13h ago
Offer description

Job Description

We’re building a national-scale data platform for UK commercial energy.

At the core is a unified view of every commercial building in the UK, and an estimate of annual energy consumption and load profile for each occupant.

We’ve already built the core spine (AddressBase, VOA, leases, CCOD/OCOD, INSPIRE, planning, NNDR, EPC, permits, renewables, Companies House). Now we need someone to own data acquisition and occupant modelling on top of this.

ROLE

Lead Data Acquisition Engineer – UK Commercial Energy Platform

Type: Full-time or long-term contract

Location: Remote (UK or Europe timezone preferred)

WHAT WE’VE BUILT SO FAR

Our current building/occupant spine includes:

• OS AddressBase Core as the UPRN spine

• VOA valuation and floor area data

• Long leases

• CCOD / OCOD + INSPIRE polygons

• Planning application data and NNDR (where available)

• EPC non-domestic data

• Environment Agency & DEFRA permitting datasets

• UK coverage of existing renewable projects

• Companies House API linkage

Your job is to sit on top of this spine and turn it into something truly useful for per-occupant energy modelling.

THE PROBLEM YOU’RE SOLVING

For each of ~2 million UK commercial buildings we want to know:

• Who the actual occupant(s) are

• How they operate in detail

• What that implies for energy use and load shape

A plastics manufacturer is not the same as a frozen food warehouse, an office, or a logistics hub.

We care about:

• What they manufacture or do

• What machinery they have on-site

• What processes they run, and when they run them

This is not a one-off scrape. It’s a systematic, repeatable pipeline that touches millions of rows.

WHAT YOU WILL DO

* Own data acquisition and scraping
* Design and run scraping / ingestion pipelines for:
* • DNO and other network datasets
* • Government and regulator datasets
* • Company-level and facility-level data beyond Companies House
* • Public signals of operations: websites, “our plant” pages, datasheets, job ads, fleet pages, Google Maps / Street View, industry directories, etc.

Build robust scrapers at scale:

• Parallelisation, retries, throttling, proxy management, error handling

• Logging and monitoring so we know what ran, what failed, and why

* Resolve who actually occupies each building
* Extend our NNDR-based approach and close the gaps:
* • Link buildings to occupants using NNDR, Companies House, planning & permitting data, web presence and other public sources

Build an entity resolution pipeline that:

• Normalises and matches company names and addresses

• Uses fuzzy matching with confidence scores

• Maintains a master building-to-occupant table with history and provenance

* Engineer occupant-specific, process-level variables
* For each building occupant, design and populate variables that matter for energy, for example:
* • Industry and sub-industry (SIC + text classification)
* • Building function / process type:
* – Manufacturing vs distribution vs office vs retail
* – Plastics vs food vs metals vs pharma, etc.
* – Cold storage, data centre, heavy process, light assembly
* • Operational characteristics:
* – Opening hours and shift patterns
* – 24/7 vs office hours
* – Indicative vehicle and truck movements
* – Refrigeration, compressed air, process heat, HVAC type
* • Machinery and equipment indicators, where possible:
* – Presence of large motors, injection moulders, CNC machines, presses, ovens, kilns, furnaces, chillers, freezers, compressors, data centre racks, etc.
* – Signals from permits, product specs, job adverts (“CNC milling centre”, “ammonia refrigeration plant”), site photos, equipment lists, OEM case studies and similar

Join all of this back to:

• VOA dimensions

• EPC primary energy and HVAC/fuel indicators

• Scope 2 and emissions disclosures where available

The key is depth and uniformity. A cold-storage warehouse will have different variables from a law firm, and a plastics injection-moulding plant different again – but everything must land in a consistent, model-ready structure across ~2M rows.

* Build and document the data layer for the modelling team
* • Design schemas for long-term use and refresh
* • Implement ETL/ELT workflows (ingest → clean → enrich → publish)
* • Add basic data-quality checks and reporting
* • Document sources, joins and assumptions so others can work confidently on top of your layer

WHAT YOU SHOULD ALREADY HAVE DONE

• 3–6+ years as a Data Engineer, Data Acquisition Engineer or similar

• Proven experience scraping and integrating large public or government datasets at scale

• A track record of production scraping pipelines, not just one-off scripts

• Strong entity-resolution background:

– Fuzzy matching, deduplication, record linkage across messy sources

– Ideally with companies and addresses

• Experience turning unstructured information (websites, PDFs, job ads, photos) into structured variables

• Experience with UK data (ONS, EPC, VOA, NNDR, planning, AddressBase, etc.) is a strong plus

TECHNICAL SKILLS – MUST HAVE

• Strong Python:

– requests or httpx

– BeautifulSoup or lxml

– Scrapy and/or Playwright or Selenium for JS-heavy sites

• Strong SQL and experience with a relational warehouse (Postgres, BigQuery, Snowflake or similar)

• Experience with an orchestration tool: Airflow, Prefect, Dagster or similar

• Comfort with:

– Parallel and async scraping

– Proxy rotation and basic anti-bot strategies

– Designing and versioning schemas

– Normalising and matching UK addresses and postcodes

• Basic geospatial comfort:

– UPRN / UARN, postcodes, lat-long

– GeoPandas / Shapely / PostGIS at a practical level

• Git and collaborative development workflows

NICE TO HAVE

• Direct exposure to OS AddressBase, VOA, EPC, NNDR, INSPIRE polygons or similar datasets

• Experience in energy, utilities, carbon accounting or real-estate analytics

• Use of NLP for text classification and keyword tagging over large corpora

• Experience with graph databases for relationship modelling

WHAT KIND OF PERSON WILL FIT

• You like turning messy, inconsistent public data into clean, reliable tables

• You enjoy thinking about data models and feature design, not just writing scrapers

• You’re comfortable working closely with founders and making pragmatic trade-offs

• You care about building pipelines that can run repeatedly without babysitting

Apply
Create E-mail Alert
Job alert activated
Saved
Save
Similar job
Commercial catering comcat engineer
London
Frontrunner Recruitment
Commercial
£45,000 a year
Similar job
Commercial services business partner
London
Cedar Recruitment
Commercial
£65,000 a year
Similar job
Commercial property
Dartford
EDWARDS LEGAL RECRUITMENT LP
Commercial
£65,000 a year
See more jobs
Similar jobs
Sales jobs in London
jobs London
jobs Greater London
jobs England
Home > Jobs > Sales jobs > Commercial jobs > Commercial jobs in London > Lead Data Acquisition Engineer - UK Commercial Energy Platform

About Jobijoba

  • Career Advice
  • Company Reviews

Search for jobs

  • Jobs by Job Title
  • Jobs by Industry
  • Jobs by Company
  • Jobs by Location
  • Jobs by Keywords

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2025 Jobijoba - All Rights Reserved

Apply
Create E-mail Alert
Job alert activated
Saved
Save