About the team/job Safety and toxicology concerns remain one of the most persistent challenges in drug discovery. This is an exciting opportunity to join a multi-disciplinary team on a project to develop a comprehensive open source side effect resource for the scientific and pharmaceutical community, and provide structured and standardised training sets for AI/ML applications to improve early identification of safety liabilities. You will harness modern Natural Language Processing (NLP) techniques to extract data from a range of relevant resources, such as clinical trials, publications and drug labels. You will work closely with team members to ensure development of automated pipelines and effective integration into the ChEMBL database and Open Targets Platform, to assist users in the selection of safe and efficacious drug targets. The Chemical Biology Services team at EMBL-EBI provides world-leading chemogenomics resources to the scientific community including ChEMBL, a database of quantitative small-molecule bioactivity data curated primarily from the scientific literature widely used to support drug discovery projects in industry and academia. The Safety 2.0 project is funded by Open Targets, a unique public-private partnership working to deliver experimental data and informatics resources that enable scientists to make more informed decisions about target selection for developing safer and more effective drugs. You will interact with safety scientists from Open Targets pharmaceutical partners MSD, Genentech, GSK, Pfizer, and Sanofi to understand requirements and how to help contribute to evaluating drug and target safety. You will be embedded at the world-leading EMBL-EBI, and will work collaboratively across the Chemical Biology Services and Open Targets groups, benefitting from a range of multi-disciplinary expertise and technologies. Your role We are looking for two enthusiastic and talented NLP data scientists, cheminformaticians or bioinformaticians with experience in NLP and knowledge extraction to join the Open Targets Safety 2.0 project for a period of 3 years. You should enjoy delving into ways of addressing challenges in knowledge extraction and data standardisation, and want to contribute to open source code and resources. The project will develop a new side effect resource for drug discovery based on the extraction of side effect data from a range of documents. Your role will focus on developing data extraction pipelines using NLP models and implementing modern NLP methods and technologies suitable to the extraction of safety data. The position provides a real opportunity to make a significant impact on a critical problem in drug discovery for the many users of the Open Targets Platform and an opportunity to contribute to the open source models and code associated with target safety. This position will be situated across the C hemical Biology Services team and the Open Targets Core Team. You will work closely with other Safety 2.0 project team members to ensure effective delivery of workpackages, and collaboratively with the Chemical Biology Services team and Open Targets Core teams to ensure effective integration and longevity of pipelines and resources. Key responsibilities Develop machine learning pipelines for extracting drug side effects from drug labels, clinical trials, publications and other documents Investigate modern NLP methodologies and propose ideas for the implementation of data extraction methods and pipelines Apply language models to extract and map drug-related information from unstructured text, e.g. from the scientific literature, ClinicalTrials.gov Implement and/or fine-tune different NLP models, e.g. NER models, transformer models, LLMs Integrate project workflows with existing infrastructures in the EBI Chemical Biology Services and Open Targets teams Prepare and evaluate benchmark datasets from the open domain as training sets for NLP models Work with domain experts to develop new gold standards for NLP tasks where needed Assist with and/or perform data curation to prepare clean and reliable training sets Apply and/or adapt existing methods for mapping extracted entities to biomedical ontologies, e.g. drugs, side effects/phenotypes, and diseases Work closely with Safety 2.0 project group members bridging the ChEMBL and Open Targets teams Work closely with the Open Targets Core team to ensure seamless integration of data and workflows into the Open Targets Platform and long-term sustainability Collaborate with the Open Targets Partners to assess, prioritise, validate and refine the developed methods Disseminate the outcomes of the project to the scientific community and stakeholders through presentations and publications You have PhD, Masters or equivalent experience in computational linguistics, computer science, bioinformatics, or cheminformatics Experience with language models e.g. transformer models, LLMs, AI agents for information extraction Experience with document and text preprocessing, cleaning and transformation techniques including mapping to ontologies Experience with data structures, data models and databases Knowledge of cheminformatics resources and/or bioinformatics databases Knowledge of data analysis and machine learning Proficiency in Python Knowledge of data frameworks e.g. pySpark, pandas, Polar Excellent attention to detail Strong communication skills, both presentations and verbal Experience working in a team-oriented environment and working collaboratively Able to work independently, to manage your time and work to deadlines You might also have; Experience with the application of NLP methods to cheminformatics and/or biomedical domains Experience with version control Experience in Safety/toxicology in industry or research Other helpful information Hybrid Working: At EMBL-EBI, we embrace a hybrid approach to work that supports both flexibility and community. Team members are usually on site at least three days a week, and a desk will always be available. We enjoy the energy of working together and encourage regular campus presence. Interviews : We plan to hold introductory meetings with selected candidates remotely starting in February 2026. Contract length : 3 years (project based) Salary : Grade 5 to Grade 6, depending on experience, qualifications. Monthly salary starting at £3,229 to £3,612 after tax but excl. pension & insurances) other paid benefits based on personal circumstances Why join us Do something meaningful At EMBL-EBI you can apply your talent and passion to accelerate science and tackle some of humankind's greatest challenges. EMBL-EBI, part of the European Molecular Biology Laboratory, is a worldwide leader in the storage, analysis and dissemination of large biological datasets. We provide the global research community with access to publicly available databases and tools which are crucial for the advancement of healthcare, food security, and biodiversity. Join a culture of innovation We are located on the Wellcome Genome Campus, alongside other prominent research and biotech organisations, and surrounded by beautiful Cambridgeshire countryside. This is a highly collaborative and inclusive community where our employees enjoy a relaxed atmosphere. We are committed to ensuring our employees feel valued, supported and empowered to reach their professional potential. Watch this video to see how EMBL-EBI makes an impact. Enjoy lots of benefits: Financial incentives: Monthly family, child and non-resident allowances, annual salary review, pension scheme, death benefit, long-term care, accident-at-work and unemployment insurances Flexible working arrangements - including hybrid working patterns Private medical insurance for you and your immediate family (including all prescriptions and generous dental & optical cover) Generous time off: 30 days annual leave per year, in addition public holidays Relocation package including installation grant (if required) Campus life: Free shuttle bus to and from work, on-site library, subsidised on-site gym and cafeteria, casual dress code, extensive sports and social club activities (on campus and remotely) Family benefits: On-site nursery, 10 days of child sick leave, generous parental leave, holiday clubs on campus and monthly family and child allowances Benefits for non-UK residents: Visa exemption, education grant for private schooling, financial support to travel back to your home country every second year and a monthly non-resident allowance. For detailed information please visit our employee benefits page here. What else you need to know International applicants: We recruit internationally and successful candidates are offered visa exemptions. Please take a look at our International Applicants page for further information. EMBL is a signatory of DORA. Find out how we apply DORA principles to our recruitment and performance assessment processes here. Diversity and inclusion: At EMBL, we strongly believe that inclusive and diverse teams benefit from higher levels of innovation and creative thought. We encourage applications from women, LGBTQ & individuals from all nationalities. How to apply: To apply please submit a cover letter and a CV through our online system. Applications will close at 23:59 CET on the date shown below. We aim to provide a response within two weeks after the closing date. Closing Date 11/01/2026