Joe Ledsam

Joe Ledsam

Joe is clinician scientist in Google Japan focusing on the application of AI to health and science. Prior to joining Google Japan, Joe spent four years in DeepMind leading multiple research projects across medical imaging and electronic health records, as well as founding the DeepMind Genomics team. Joe remains an active collaborator with Google Health, DeepMind and other research groups throughout Google. He obtained his medical degree from The University of Leeds, UK, and was a research fellow at University College London during his years in clinical practice.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Artificial intelligence as a second reader for screening mammography
    Etsuji Nakai
    Alessandro Scoccia Pappagallo
    Hiroki Kayama
    Lin Yang
    Shawn Xu
    Timo Kohlberger
    Daniel Golden
    Akib Uddin
    Radiology Advances, 1(2)(2024)
    Preview abstract Background Artificial intelligence (AI) has shown promise in mammography interpretation, and its use as a second reader in breast cancer screening may reduce the burden on health care systems. Purpose To evaluate the performance differences between routine double read and an AI as a second reader workflow (AISR), where the second reader is replaced with AI. Materials and Methods A cohort of patients undergoing routine breast cancer screening at a single center with mammography was retrospectively collected between 2005 and 2021. A model developed on US and UK data was fine-tuned on Japanese data. We subsequently performed a reader study with 10 qualified readers with varied experience (5 reader pairs), comparing routine double read to an AISR workflow. Results A “test set” of 4,059 women (mean age, 56 ± 14 years; 157 positive, 3,902 negative) was collected, with 278 (mean age 55 ± 13 years; 90 positive, 188 negative) evaluated for the reader study. We demonstrate an area under the curve =.84 (95% confidence interval [CI], 0.805-0.881) on the test set, with no significant difference to decisions made in clinical practice (P = .32). Compared with routine double reading, in the AISR arm, sensitivity improved by 7.6% (95% CI, 3.80-11.4; P = .00004) and specificity decreased 3.4% (1.42-5.43; P = .0016), with 71% (212/298) of scans no longer requiring input from a second reader. Variation in recall decision between reader pairs improved from a Cohen kappa of κ = .65 (96% CI, 0.61-0.68) to κ = .74 (96% CI, 0.71-0.77) in the AISR arm. View details
    Assistive AI in Lung Cancer Screening: A Retrospective Multinational Study in the United States and Japan
    Atilla Kiraly
    Corbin Cunningham
    Ryan Najafi
    Jie Yang
    Chuck Lau
    Diego Ardila
    Scott Mayer McKinney
    Rory Pilgrim
    Mozziyar Etemadi
    Sunny Jansen
    Lily Peng
    Shravya Shetty
    Neeral Beladia
    Krish Eswaran
    Radiology: Artificial Intelligence(2024)
    Preview abstract Lung cancer is the leading cause of cancer death world-wide with 1.8 million deaths in 20201. Studies have concluded that low-dose computed tomography lung cancer screening can reduce mortality by up to 61%2 and updated 2021 US guidelines expanded eligibility. As screening efforts rise, AI can play an important role, but must be unobtrusively integrated into existing clinical workflows. In this work, we introduce a state-of-the-art, cloud-based AI system providing lung cancer risk assessments without requiring any user input. We demonstrate its efficacy in assisting lung cancer screening under both US and Japanese screening settings using different patient populations and screening protocols. Technical improvements over a previously described system include a focus on earlier cancer detection for improved accuracy, introduction of an effective assistive user interface, and a system designed to integrate into typical clinical workflows. The stand-alone AI system was evaluated on 3085 individuals achieving area under the curve (AUC) scores of 91.7% (95%CI [89.6, 95.2]), 93.3% (95%CI [90.2, 95.7]), and 89.1% (95%CI [77.7, 97.3]) on three datasets (two from US and one from Japan), respectively. To evaluate the system’s assistive ability, we conducted two retrospective multi-reader multi-case studies on 627 cases read by experienced board certified radiologists (average 20 years of experience [7,40]) using local PACS systems in the respective US and Japanese screening settings. The studies measured the reader’s level of suspicion (LoS) and categorical responses for scores and management recommendations under country-specific screening protocols. The radiologists’ AUC for LoS increased with AI assistance by 2.3% (95%CI [0.1-4.5], p=0.022) for the US study and by 2.3% (95%CI [-3.5-8.1], p=0.179) for the Japan study. Specificity for recalls increased by 5.5% (95%CI [2.7-8.5], p<0.0001) for the US and 6.7% (95%CI [4.7-8.7], p<0.0001) for the Japan study. No significant reduction in other metrics occured. This work advances the state-of-the-art in lung cancer detection, introduces generalizable interface concepts that can be applicable to similar AI applications, and demonstrates its potential impact on diagnostic AI in global lung cancer screening with results suggesting a substantial drop in unnecessary follow-up procedures without impacting sensitivity. View details
    Use of deep learning to develop continuous-risk models for adverse event prediction from electronic health records
    Nenad Tomašev
    Sebastien Baur
    Anne Mottram
    Xavier Glorot
    Jack William Rae
    Michal Zielinski
    Harry Askham
    Andre Saraiva
    Valerio Magliulo
    Clemens Meyer
    Suman Venkatesh Ravuri
    Alistair Connell
    Cían Hughes
    Julien Cornebise
    Hugh Montgomery
    Geraint Rees
    Christopher Laing
    Clifton R. Baker
    Thomas Osborne
    Ruth Reeves
    Demis Hassabis
    Dominic King
    Mustafa Suleyman
    Trevor John Back
    Christopher Nielsen
    Martin Gamunu Seneviratne
    Shakir Mohamad
    Nature Protocols(2021)
    Preview abstract Early prediction of patient outcomes is important for targeting preventive care. This protocol describes a practical workflow for developing deep-learning risk models that can predict various clinical and operational outcomes from structured electronic health record (EHR) data. The protocol comprises five main stages: formal problem definition, data pre-processing, architecture selection, calibration and uncertainty, and generalizability evaluation. We have applied the workflow to four endpoints (acute kidney injury, mortality, length of stay and 30-day hospital readmission). The workflow can enable continuous (e.g., triggered every 6 h) and static (e.g., triggered at 24 h after admission) predictions. We also provide an open-source codebase that illustrates some key principles in EHR modeling. This protocol can be used by interdisciplinary teams with programming and clinical expertise to build deep-learning prediction models with alternate data sources and prediction tasks. View details
    Clinically Applicable Segmentation of Head and Neck Anatomy for Radiotherapy: Deep Learning Algorithm Development and Validation Study
    Stanislav Nikolov
    Sam Blackwell
    Alexei Zverovitch
    Ruheena Mendes
    Michelle Livne
    Jeffrey De Fauw
    Yojan Patel
    Clemens Meyer
    Harry Askham
    Bernardino Romera Paredes
    Carlton Chu
    Dawn Carnell
    Cheng Boon
    Derek D'Souza
    Syed Moinuddin
    Yasmin Mcquinlan
    Sarah Ireland
    Kiarna Hampton
    Krystle Fuller
    Hugh Montgomery
    Geraint Rees
    Mustafa Suleyman
    Trevor John Back
    Cían Hughes
    Olaf Ronneberger
    JMIR(2021)
    Preview abstract Background: Over half a million individuals are diagnosed with head and neck cancer each year globally. Radiotherapy is an important curative treatment for this disease, but it requires manual time to delineate radiosensitive organs at risk. This planning process can delay treatment while also introducing interoperator variability, resulting in downstream radiation dose differences. Although auto-segmentation algorithms offer a potentially time-saving solution, the challenges in defining, quantifying, and achieving expert performance remain. Objective: Adopting a deep learning approach, we aim to demonstrate a 3D U-Net architecture that achieves expert-level performance in delineating 21 distinct head and neck organs at risk commonly segmented in clinical practice. Methods: The model was trained on a data set of 663 deidentified computed tomography scans acquired in routine clinical practice and with both segmentations taken from clinical practice and segmentations created by experienced radiographers as part of this research, all in accordance with consensus organ at risk definitions. Results: We demonstrated the model’s clinical applicability by assessing its performance on a test set of 21 computed tomography scans from clinical practice, each with 21 organs at risk segmented by 2 independent experts. We also introduced surface Dice similarity coefficient, a new metric for the comparison of organ delineation, to quantify the deviation between organ at risk surface contours rather than volumes, better reflecting the clinical task of correcting errors in automated organ segmentations. The model’s generalizability was then demonstrated on 2 distinct open-source data sets, reflecting different centers and countries to model training. Conclusions: Deep learning is an effective and clinically applicable technique for the segmentation of the head and neck anatomy for radiotherapy. With appropriate validation studies and regulatory approvals, this system could improve the efficiency, consistency, and safety of radiotherapy pathways. View details
    Google and DeepMind: Deep Learning Systems in Ophthalmology
    Xinle Liu
    Akinori Mitani
    Terry Spitz
    Derek Wu
    Artificial Intelligence in Ophthalmology(2021)
    Preview abstract Deep learning has a profound potential to improve patient outcomes. To achieve this, a holistic, patient-centered approach is crucial. In ophthalmology, artificial intelligence studies have spanned a diverse spectrum including algorithm development, human computer interaction, clinical validation, and novel biomarker discovery. In this chapter we highlight the work of Google and DeepMind in these areas, as a set of end-to-end case studies for developing and implementing artificial intelligence in clinical practice. View details
    A prospective evaluation of AI-augmented epidemiology to forecast COVID-19 in the USA and Japan
    Joel Shor
    Arkady Epshteyn
    Ashwin Sura Ravi
    Beth Luan
    Chun-Liang Li
    Daisuke Yoneoka
    Dario Sava
    Hiroaki Miyata
    Hiroki Kayama
    Isaac Jones
    Joe Mckenna
    Johan Euphrosine
    Kris Popendorf
    Nate Yoder
    Shashank Singh
    Shuhei Nomura
    Thomas Tsai
    npj Digital Medicine(2021)
    Preview abstract The COVID-19 pandemic has highlighted the global need for reliable models of disease spread. We evaluate an AI-improved forecasting approach that provides daily predictions of the expected number of confirmed COVID-19 deaths, cases and hospitalizations during the following 28 days. We present an international, prospective evaluation of model performance across all states and counties in the USA and prefectures in Japan. National mean absolute percentage error (MAPE) for predicting COVID-19 associated deaths before and after prospective deployment remained consistently <3% (US) and <10% (Japan). Average statewide (US) and prefecture wide (Japan) MAPE was 6% and 20% respectively (14% when looking at prefectures with more than 10 deaths).We show our model performs well even during periods of considerable change in population behavior, and that it is robust to demographic differences across different geographic locations.We further demonstrate the model provides meaningful explanatory insights, finding that the model appropriately responds to local and national policy interventions. Our model enables counterfactual simulations, which indicate continuing NPIs alongside vaccinations is essential for more rapidly recovering from the pandemic, delaying the application of interventions has a detrimental effect, and allow exploration of the consequences of different vaccination strategies. The COVID-19 pandemic remains a global emergency. In the face of substantial challenges ahead, the approach presented here has the potential to inform critical decisions. View details
    Predicting conversion to wet age-related macular degeneration using deep learning
    Jason Yim
    Reena Chopra
    Terry Spitz
    Annette Obika
    Harry Askham
    Marko Lukic
    Josef Huemer
    Katrin Fasler
    Gabriella Moraes
    Clemens Meyer
    Marc Wilson
    Jonathan Mark Dixon
    Cían Hughes
    Geraint Rees
    Peng Khaw
    Dominic King
    Demis Hassabis
    Mustafa Suleyman
    Trevor John Back
    Pearse Keane
    Jeffrey De Fauw
    Nature Medicine(2020)
    Preview abstract Progression to exudative ‘wet’ age-related macular degeneration (exAMD) is a major cause of visual deterioration. In patients diagnosed with exAMD in one eye, we introduce an artificial intelligence (AI) system to predict progression to exAMD in the second eye. By combining models based on three-dimensional (3D) optical coherence tomography images and corresponding automatic tissue maps, our system predicts conversion to exAMD within a clinically actionable 6-month time window, achieving a per-volumetric-scan sensitivity of 80% at 55% specificity, and 34% sensitivity at 90% specificity. This level of performance corresponds to true positives in 78% and 41% of individual eyes, and false positives in 56% and 17% of individual eyes at the high sensitivity and high specificity points, respectively. Moreover, we show that automatic tissue segmentation can identify anatomical changes before conversion and high-risk subgroups. This AI system overcomes substantial interobserver variability in expert predictions, performing better than five out of six experts, and demonstrates the potential of using AI to predict disease progression. View details
    International evaluation of an AI system for breast cancer screening
    Scott Mayer McKinney
    Varun Yatindra Godbole
    Jonathan Godwin
    Natasha Antropova
    Hutan Ashrafian
    Trevor John Back
    Mary Chesus
    Ara Darzi
    Mozziyar Etemadi
    Florencia Garcia-Vicente
    Fiona J Gilbert
    Mark D Halling-Brown
    Demis Hassabis
    Sunny Jansen
    Dominic King
    David Melnick
    Hormuz Mostofi
    Lily Hao Yi Peng
    Joshua Reicher
    Bernardino Romera Paredes
    Richard Sidebottom
    Mustafa Suleyman
    Kenneth C. Young
    Jeffrey De Fauw
    Shravya Ramesh Shetty
    Nature(2020)
    Preview abstract Screening mammography aims to identify breast cancer at earlier stages of the disease, when treatment can be more successful. Despite the existence of screening programmes worldwide, the interpretation of mammograms is affected by high rates of false positives and false negatives. Here we present an artificial intelligence (AI) system that is capable of surpassing human experts in breast cancer prediction. To assess its performance in the clinical setting, we curated a large representative dataset from the UK and a large enriched dataset from the USA. We show an absolute reduction of 5.7% and 1.2% (USA and UK) in false positives and 9.4% and 2.7% in false negatives. We provide evidence of the ability of the system to generalize from the UK to the USA. In an independent study of six radiologists, the AI system outperformed all of the human readers: the area under the receiver operating characteristic curve (AUC-ROC) for the AI system was greater than the AUC-ROC for the average radiologist by an absolute margin of 11.5%. We ran a simulation in which the AI system participated in the double-reading process that is used in the UK, and found that the AI system maintained non-inferior performance and reduced the workload of the second reader by 88%. This robust assessment of the AI system paves the way for clinical trials to improve the accuracy and efficiency of breast cancer screening. View details
    Predicting OCT-derived DME grades from fundus photographs using deep learning
    Arunachalam Narayanaswamy
    Avinash Vaidyanathan Varadarajan
    Dr. Paisan Raumviboonsuk
    Dr. Peranut Chotcomwongse
    Jorge Cuadros
    Lily Hao Yi Peng
    Pearse Keane
    Subhashini Venugopalan
    Nature Communications(2020)
    Preview abstract Diabetic eye disease is one of the fastest growing causes of preventable blindness. With the advent of anti-VEGF therapies, it has become increasingly important to detect center-involved DME (ci-DME). However, ci-DME is diagnosed using optical coherence tomography (OCT), which is not generally available at screening sites. Instead, screening programs rely on the detection of hard exudates as a proxy for DME on color fundus photographs, but this often results in a fair number of false positive and false negative calls. We trained a deep learning model to use color fundus images to directly predict grades derived from OCT exams for DME. Our OCT-based model had an AUC of 0.89 (95% CI: 0.87-0.91), which corresponds to a sensitivity of 85% at a specificity of 80%. In comparison, the ophthalmology graders had sensitivities ranging from 82%-85% and specificities ranging from 44%-50%. These metrics correspond to a PPV of 61% (95% CI: 56%-66%) for the OCT-based algorithm and a range of 36-38% (95% CI ranging from 33% -42%) for ophthalmologists. In addition, we used multiple attention techniques to explain how the model is making its prediction. The ability of deep learning algorithms to make clinically relevant predictions that generally requires sophisticated 3D-imaging equipment from simple 2D images has broad relevance to many other applications in medical imaging. View details
    A clinically applicable approach to continuous prediction of future acute kidney injury
    Nenad Tomašev
    Xavier Glorot
    Jack W Rae
    Michal Zielinski
    Harry Askham
    Andre Saraiva
    Anne Mottram
    Clemens Meyer
    Suman Ravuri
    Alistair Connell
    Cían O Hughes
    Julien Cornebise
    Hugh Montgomery
    Geraint Rees
    Chris Laing
    Clifton R Baker
    Kelly Peterson
    Ruth Reeves
    Demis Hassabis
    Dominic King
    Mustafa Suleyman
    Trevor Back
    Christopher Nielson
    Shakir Mohamed
    Nature, 572(2019), pp. 116-119
    Preview abstract The early prediction of deterioration could have an important role in supporting healthcare professionals, as an estimated 11% of deaths in hospital follow a failure to promptly recognize and treat deteriorating patients. To achieve this goal requires predictions of patient risk that are continuously updated and accurate, and delivered at an individual level with sufficient context and enough time to act. Here we develop a deep learning approach for the continuous risk prediction of future deterioration in patients, building on recent work that models adverse events from electronic health records and using acute kidney injury—a common and potentially life-threatening condition—as an exemplar. Our model was developed on a large, longitudinal dataset of electronic health records that cover diverse clinical environments, comprising 703,782 adult patients across 172 inpatient and 1,062 outpatient sites. Our model predicts 55.8% of all inpatient episodes of acute kidney injury, and 90.2% of all acute kidney injuries that required subsequent administration of dialysis, with a lead time of up to 48 h and a ratio of 2 false alerts for every true alert. In addition to predicting future acute kidney injury, our model provides confidence assessments and a list of the clinical features that are most salient to each prediction, alongside predicted future trajectories for clinically relevant blood tests. Although the recognition and prompt treatment of acute kidney injury is known to be challenging, our approach may offer opportunities for identifying patients at risk within a time window that enables early treatment. View details
    A Probabilistic U-Net for Segmentation of Ambiguous Images
    Ali Eslami
    Bernardino Romera Paredes
    Clemens Meyer
    Danilo Jimenez Rezende
    Jeffrey De Fauw
    Klaus H. Maier-Hein
    Olaf Ronneberger
    Simon Kohl
    ArXiv(2018)
    Preview abstract Many real-world vision problems suffer from inherent ambiguities. In clinical applications for example, it might not be clear from a CT scan alone which particular region is cancer tissue. Therefore a group of graders typically produces a set of diverse but plausible segmentations. We consider the task of learning a distribution over segmentations given an input. To this end we propose a generative segmentation model based on a combination of a U-net with a conditional variational autoencoder that is capable of efficiently producing an unlimited number of plausible hypotheses. We show on a lung abnormalities segmentation task and on a Cityscapes segmentation task that our model reproduces the possible segmentation variants as well as the frequencies with which they occur, doing so significantly better than published approaches. These models could have a high impact in real-world applications, such as being used as clinical decision-making algorithms accounting for multiple plausible semantic segmentation hypotheses to provide possible diagnoses and recommend further actions to resolve the present ambiguities. View details
    Clinically applicable deep learning for diagnosis and referral in retinal optical coherence tomography
    Jeffrey De Fauw
    Bernardino Romera Paredes
    Stanislav Nikolov Nikolov
    Nenad Tomašev
    Sam Julian Blackwell
    Harry Askham
    Xavier Glorot
    Brendan O'Donoghue
    Daniel James Visentin
    George van den Driessche
    Clemens Meyer
    Faith Mackinder
    Simon Bouton
    Kareem Ayoub
    Reena Chopra
    Dominic King
    Cían Hughes
    Rosalind Raine
    Julian Hughes
    Dawn Sim
    Catherine Egan
    Adnan Tufail
    Hugh Montgomery
    Demis Hassabis
    Geraint Rees
    Trevor John Back
    Peng Khaw
    Mustafa Suleyman
    Julien Cornebise
    Pearse Keane
    Olaf Ronneberger
    Nature(2018)
    Preview abstract The volume and complexity of diagnostic imaging is increasing at a pace faster than the availability of human expertise to interpret it. Artificial intelligence has shown great promise in classifying two-dimensional photographs of some common diseases and typically relies on databases of millions of annotated images. Until now, the challenge of reaching the performance of expert clinicians in a real-world clinical pathway with three-dimensional diagnostic scans has remained unsolved. Here, we apply a novel deep learning architecture to a clinically heterogeneous set of three-dimensional optical coherence tomography scans from patients referred to a major eye hospital. We demonstrate performance in making a referral recommendation that reaches or exceeds that of experts on a range of sight-threatening retinal diseases after training on only 14,884 scans. Moreover, we demonstrate that the tissue segmentations produced by our architecture act as a device-independent representation; referral accuracy is maintained when using tissue segmentations from a different type of device. Our work removes previous barriers to wider clinical use without prohibitive training data requirements across multiple pathologies in a real-world setting. View details
    Preview abstract Objective: Refractive error, one of the leading cause of visual impairment, can be corrected by simple interventions like prescribing eyeglasses, which often starts with autorefraction to estimate the refractive error. In this study, using deep learning, we trained a network to estimate refractive error from fundus photos only. Design: Retrospective analysis. Subjects, Participants, and/or Controls: Retinal fundus images from participants in the UK Biobank cohort, which were 45 degree field of view images and the AREDS clinical trial, which contained 30 degree field of view images. Methods, Intervention, or Testing: Refractive error was measured by autorefraction in the UK Biobank dataset and subjective refraction in the AREDS dataset. We trained a deep learning algorithm to predict refractive error from the fundus photographs and tested the prediction of the algorithm to the documented refractive error measurement. Our model used attention for identifying features that are predictive for refractive error. Main Outcome Measures: Mean average error (MAE) of the algorithm’s prediction compared to the refractive error obtained in the AREDS and UK Biobank. Results: The resulting algorithm had a mean average error (MAE) of 0.56 diopters (95% CI: 0.55-0.56) for estimating spherical equivalent on the UK Biobank dataset and 0.91 diopters (95% CI: 0.89-0.92) for the AREDS dataset. The baseline expected MAE (obtained by simply predicting the mean of this population) is 1.81 diopters (95% CI: 1.79-1.84) for UK Biobank and 1.63 (95% CI: 1.60-1.67) for AREDS. Attention maps suggest that the foveal region is one of the most important areas that is used by the algorithm to make this prediction, though other regions also contribute to the prediction. Conclusions: The ability to estimate refractive error with high accuracy from retinal fundus photos has not been previously known and demonstrates that deep learning can be applied to make novel predictions from medical images. In addition, given that several groups have recently shown that it is feasible to obtain retinal fundus photos using mobile phones and inexpensive attachments, this work may be particularly relevant in regions of the world where autorefractors may not be readily available. View details
    No Results Found