Principle data engineer

London

Generative Group

Data engineer

€100,000 a year

Posted: 13 October

Offer description

Overview

Our client in the Life Science industry is a startup in stealth mode backed by strong funding. They are seeking a Principal Data Engineer to lead the data and infrastructure systems powering the foundation model transforming drug development.

Responsibilities

* Lead data and infrastructure systems powering foundation model initiatives in drug development.
* Own data workflows end-to-end, from extraction and transformation to clean Parquet outputs for machine learning teams.
* Collaborate closely with wet lab teams; practically understand assays and protocol development.
* Set up cloud data infrastructure from scratch, including compute, storage, networking, and access controls.
* Build reliable, repeatable pipelines with testing, version control, and clear documentation.
* Maintain data quality, lineage, and monitoring; implement sound data modeling practices.

Qualifications (Requirements)

* Principal-level data engineering experience in life sciences is essential.
* End-to-end ownership of data workflows from extraction to machine learning-ready outputs (Parquet).
* Hands-on familiarity with genomics data, including raw FASTQ files and Illumina sequencer outputs.
* Experience with metabolomics data, particularly untargeted mass spectrometry.
* Strong collaboration with wet lab teams and practical understanding of assays and protocol development.
* Cloud data infrastructure built from scratch (compute, storage, networking, access controls).
* Strong Python and SQL skills; proficient in data modeling, data quality, lineage, and monitoring.
* Ability to design and maintain reliable pipelines with testing and documentation.

Preferences

* Experience building data lakes or lakehouses and automating batch workflows (e.g., Airflow).
* Familiarity with NGS pipelines (quality control, alignment/assembly, variant calling) and mass spectrometry data analysis.
* Use of Infrastructure as Code (Terraform), containerization (Docker), and CI/CD for deploying data systems.
* Prior 0-to-1 startup experience and close collaboration with ML and biology teams.

Why Join

* Design and build cloud infrastructure and data pipelines powering distributed ML training and scalable biological data workflows—without legacy constraints.
* Work with first-of-their-kind, multi-modal datasets to support foundation model training at AlphaFold scale; this is a builder role with deep technical ownership.
* Join as a founding member of the engineering team with significant equity and end-to-end system ownership.
* See your work directly enable drug discoveries that will impact millions, collaborating with world-leading scientists in microbiome research and machine learning.

Location: London - 3 days onsite
Salary: £ 80 000 - £ 120 000 plus equity

#J-18808-Ljbffr

Apply

Create E-mail Alert

Save

Similar job

Data engineer

London

TechYard

Data engineer

Similar job

Data engineer - gsk0jp00107167

London

Experis It

Data engineer

Similar job

Senior data engineer

London

Burns Sheehan

Data engineer