About the Team
The Velankar team maintains macromolecular structure databases that form essential resources for biologists and life scientists worldwide. PDBe is a founding partner of the Worldwide Protein Data Bank and develops the PDBe Knowledge Base (PDBe-KB) and AlphaFold Protein Structure Database (AFDB). The team is international and interdisciplinary, consisting of expert data curators, bioinformaticians, scientific software developers and IT specialists.
Role Overview
We are looking for a Data Engineer to optimise and enhance data pipelines, ensuring efficient data processing, storage, and retrieval. The role involves analysing requirements, proposing new architecture, and implementing solutions to improve performance and scalability.
Responsibilities
* Analyse existing data pipelines and identify areas for improvement, optimisation and scalability.
* Collaborate with Bioinformaticians and annotators to integrate pipelines with existing systems.
* Monitor pipeline performance, troubleshoot issues and implement solutions to maintain reliability.
* Stay current with industry trends and recommend new technologies or tools to enhance the infrastructure.
* Document pipelines, processes and workflows for internal reference and knowledge sharing.
Required Qualifications
* MSc in Computer Science, IT, or a related field; PhD is a plus.
* Expertise in Data Modelling, Advanced SQL and relational database design.
* Proficiency in Python programming.
* Experience with ETL processes and tools for large‑scale data processing.
* Strong understanding of PostgreSQL, Oracle, MySQL/MariaDB and experience with multiple RDBMS platforms.
* Proven database migration experience, particularly between Oracle and PostgreSQL.
* Knowledge of data warehousing solutions such as Redshift and BigQuery.
* Strong communication and collaboration skills.
* Proficiency in oral and written English.
Nice to Have
* PhD in a related field.
* Experience with big‑data technologies (Spark, Hadoop).
* Hands‑on CI/CD (GitLab CI / GitHub Actions).
* Familiarity with Java, Google Cloud Platform or AWS.
* Experience with graph databases, AI/ML data modelling, or data visualisation tools (Tableau, PowerBI).
* Knowledge of structural biology and bioinformatics.
* Experience working in international teams.
Working Conditions
Hybrid working: two days on‑site at the Wellcome Genome Campus, three days remote.
Contract: Grant‑based, 3 years.
Salary: Grade 5 monthly salary starting at £3,303 per month after tax (excluding pension and insurance contributions) plus generous benefits.
Benefits
* Financial incentives: monthly family, child and non‑resident allowances, annual salary review, pension scheme, insurance benefits.
* Flexible working arrangements, private medical and dental insurance.
* 30 days annual leave plus public holidays.
* Relocation package with installation grant where required.
* Campus facilities: shuttle bus, on‑site library, gym and cafeteria, sports and social activities.
* Family support: on‑site nursery, child sick leave, generous parental leave.
* Non‑UK resident benefits: visa exemption, education grant, travel support and monthly allowance.
Diversity and Inclusion
EMBL is a signatory of DORA. We encourage applications from candidates of all genders, identities, nationalities and other diverse backgrounds.
How to Apply
Submit a cover letter and CV through our online system. Applications close on 28 June 2026.
#J-18808-Ljbffr