Overview
Applications are invited for a non-clinical PhD studentship (starting October 2026 or before) based within the Department of Public Health and Primary Care, University of Cambridge.
Academic Supervisor
Professor Angela Wood, Professor of Biostatistics and Health Data Science, Department of Public Health and Primary Care.
Project Title
Cancer Data Driven Detection: Handling missing data in cancer risk prediction models
Background
Cancer Data Driven Detection is a new, multidisciplinary and multi-institutional strategic national research programme dedicated to using data to transform our understanding of cancer risk and enable early interception of cancers. It represents a major, multi-million-pound flagship investment funded through a strategic programme award by Cancer Research UK, the National Institute for Health and Care Research, the Engineering and Physical Sciences Research Council, and the Peter Sowerby Foundation; in partnership with Health Data Research UK and the Economic and Social Research Council's Administrative Data Research UK programme.
Project Description
Early cancer diagnosis is often challenging for patients presenting with vague, non‑specific symptoms that may be linked to multiple cancer sites. This project aims to improve diagnostic decision‑making in such patients by developing advanced, equitable cancer risk prediction models that effectively handle missing and incomplete symptom data recorded in electronic health records (EHRs). Missing symptom codes do not reliably indicate that the symptom did not occur, as symptom data are often incompletely captured. Symptom recording depends on multiple stages – from patient recognition and communication to clinician coding – each introducing opportunities for information loss. This missingness is not random: it is influenced by clinical factors and varies across demographic and geographic groups, potentially reflecting broader inequalities in healthcare engagement and recording practices. The project will systematically investigate how patterns of data completeness differ by patient and practice characteristics (e.g. age, sex, ethnicity, deprivation and geography), how these patterns evolve over time, and how they influence cancer risk estimates. Understanding and addressing these biases is crucial to avoid exacerbating health inequalities through prediction models that disproportionately benefit advantaged groups. Using large-scale linked electronic health record data, suitable models will be employed to identify determinants of missing data in symptoms, blood test results and other key variables, accounting for clustering at the practice level. These models will quantify the extent of variation and identify systematic differences in coding practices between providers and patient subgroups. Temporal analyses will assess how these patterns change over time. Building on these findings, the project will quantify how different patterns of missingness may impact risk prediction model performance and calibration. Novel methods will be developed to incorporate incomplete or uncertain information, including delta‑adjustment imputation and other approaches that explicitly model symptom recording probabilities. Emphasis will be placed on ensuring reproducibility, interpretability and adaptability as data completeness evolves with changing healthcare practices. The ultimate goal is to produce robust, fair and clinically useful cancer risk prediction models that account for systematic biases in symptom data recording, ensuring that such models benefit all patient groups equitably. The work will also contribute to the methodological literature on missing data, with wider applications in predictive modelling across healthcare. The student will gain expertise in statistical modelling, simulation, electronic health record data science and fairness evaluation – skills directly aligned with modern data‑driven cancer research and clinical translation. Please refer to the attached document for further information on Outcomes and Research Environment.
Requirements
Applicants are expected to hold at least a 2:1 undergraduate degree (or equivalent) in a relevant subject such as statistics, mathematics, computer science, engineering, or a related biomedical or population health discipline, and may also have a Master’s degree in a quantitative or health data field. Applicants should be able to demonstrate excellent analytical and programming skills (e.g. in R or Python), experience working with health data and an enthusiasm for interdisciplinary research that bridges data science, healthcare and population health. Strong communication and teamwork skills are essential, and international applicants may need to provide evidence of English language proficiency.
We invite applications from UK and non‑UK students who meet the UK residency requirements (home fees). International students who are able to confirm that additional costs of all overseas tuition fees will be covered through other scholarships or funding schemes will also be considered.
The studentship provides the UKRI 2026 stipend rate, currently £20,780 annually.
Further information on possible sources of support for non‑UK applicants can be found at https://www.student‑funding.cam.ac.uk/ as well as through external funding opportunities.
Applicants must meet the University of Cambridge entrance requirements: see https://www.postgraduate.study.cam.ac.uk/application-process/entry-requirements.
How to Apply
To apply please visit https://www.postgraduate.study.cam.ac.uk/courses/directory/cvphpdhpc and click ‘Apply Now’.
Course Details
* Course: PhD in Public Health & Primary Care (Full‑time)
* Start Date: October 2026, Michaelmas Term (or before)
* Academic Supervisor(s): Professor Angela Wood, Department of Public Health and Primary Care
* Research Title: Cancer Data Driven Detection: Handling missing data in cancer risk prediction models
* Application requirements:
o Details of two academic referees (references will be taken up immediately).
o Transcript(s).
o CV/resume.
o Evidence of competence in English.
o Statement of Interest outlining your suitability, why you are interested in a PhD in this area, your background and research interests.
Interview and Selection Process
The deadline for applications is Monday 9th March 2026.
Applicants will be notified of the outcome of their application by 16th March 2026.
Shortlisted candidates will be invited to interview in the week commencing 23rd March 2026.
Applicants will be notified of the outcome of their interview soon after.
Data & Equality Statements
For information about how your personal data is used as an applicant, please see the section on Applicant Data: https://www.hr.admin.cam.ac.uk/hr-staff/hr-data/applicant-data on our HR web pages.
Please quote reference RH48811 on your application and in any correspondence about this vacancy.
The University actively supports equality, diversity and inclusion and encourages applications from all sections of society.
The University has a responsibility to ensure that all employees are eligible to live and work in the UK.
#J-18808-Ljbffr