Social network you want to login/join with: Do you want to help us improve human health and understand life on Earth? Make your mark by shaping the future to enable or deliver life-changing science to solve some of humanity’s greatest challenges. Principal Research Data Scientist
We seek a
Principal Machine Learning Research Data Scientist
to join a collaborative project between the Wellcome Sanger Institute and Open Targets. This project aims to leverage datasets generated at the Sanger Institute and publicly available data from human cells to create foundational models for biology, enhancing our understanding of life's rules and improving health for all. You will work within an interdisciplinary team of life scientists and computer/ML scientists, with a shared objective of advancing biological research through these models. This role will sit within the AI/ML Faculty group led by Dr. Mohammad Lotfollahi, and the successful candidates, across different seniority levels, will be responsible for delivering their scientific research projects as part of the broader team strategy. About the Role Your role will involve designing foundational models leveraging multi-modal readouts, integrating and processing data from various sources to develop robust AI models. You will work with open-source software, proposing, developing, and maintaining solutions to analyze large-scale single-cell datasets. We have access to unique data and the capacity to generate data for training models, supported by substantial computational power and GPU resources. Our teams are experienced in generating and analyzing datasets, including millions of cells across tissues and conditions (e.g., disease, healthy). This requires a detailed understanding of training large-scale ML models and a track record of large data-science projects. You will be responsible for: Managing and leading machine learning research projects and publishing results in scientific journals or conferences (ICLR, ICML, CVPR, etc.) Collaborating with team members to propose, develop, and evaluate machine learning models for understanding single-cell data and drug discovery applications Supervising and training Ph.D. students and postdocs in interdisciplinary scientific problems in biology Writing scientific papers on biotechnology and biology Distilling solutions into open-source, user-friendly packages with documentation for biologists and bioinformaticians Presenting research and pipelines to internal and external audiences About You: You will be supported in your development and have opportunities to lead publications and present at conferences on genetics and genomics in drug discovery. ● Ph.D. or M.Sc. with relevant research experience in fields like Computer Science, Computational Biology, Genetics, Bioinformatics, Physics, Engineering, or Applied Mathematics ● Previous ML research experience in academic or scientific environments (including RA/Internships) ● Strong Python skills, including libraries like Scikit-Learn, SciPy, TensorFlow, and PyTorch ● Experience in designing, training, and deploying ML models ● Handling large datasets with techniques like data cleaning, feature engineering, and augmentation ● Experience with high-performance computing environments and GPUs ● Knowledge of NLP and transformer models like BERT and GPT ● Familiarity with generative models such as diffusion and flow matching ● Good software development practices and collaboration tools (git, Python packaging, code reviews) ● Strong analytical and problem-solving skills ● Excellent communication skills for explaining complex ML concepts to non-technical stakeholders Evidence of research experience in machine learning In addition, you should demonstrate: Ability to understand complex scientific and technical challenges and break them down into actionable steps Flexibility to adapt in a changing environment Effective workload management and timely delivery Networking, influencing, and relationship-building skills Strategic thinking and seeing the bigger picture Ability to build collaborative relationships at all levels Respect and inclusivity Relevant publications from the group include: Lotfollahi, M. et al. Mapping single-cell data to reference atlases by transfer learning.
Nature Biotechnology. Lotfollahi, M.
et al. scGen predicts single-cell perturbation responses.
Nature Methods. Lotfollahi, M.
et al. Biologically informed deep learning to query gene programs in single cell atlases.
Nature Cell Biology .
#J-18808-Ljbffr