Joe Ledsam

Joe Ledsam

Joe is clinician scientist in Google Japan focusing on the application of AI to health and science. Prior to joining Google Japan, Joe spent four years in DeepMind leading multiple research projects across medical imaging and electronic health records, as well as founding the DeepMind Genomics team. Joe remains an active collaborator with Google Health, DeepMind and other research groups throughout Google. He obtained his medical degree from The University of Leeds, UK, and was a research fellow at University College London during his years in clinical practice.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Artificial intelligence as a second reader for screening mammography
    Etsuji Nakai
    Alessandro Scoccia Pappagallo
    Hiroki Kayama
    Lin Yang
    Shawn Xu
    Christopher Kelly
    Timo Kohlberger
    Daniel Golden
    Akib Uddin
    Radiology Advances, 1(2) (2024)
    Preview abstract Background Artificial intelligence (AI) has shown promise in mammography interpretation, and its use as a second reader in breast cancer screening may reduce the burden on health care systems. Purpose To evaluate the performance differences between routine double read and an AI as a second reader workflow (AISR), where the second reader is replaced with AI. Materials and Methods A cohort of patients undergoing routine breast cancer screening at a single center with mammography was retrospectively collected between 2005 and 2021. A model developed on US and UK data was fine-tuned on Japanese data. We subsequently performed a reader study with 10 qualified readers with varied experience (5 reader pairs), comparing routine double read to an AISR workflow. Results A “test set” of 4,059 women (mean age, 56 ± 14 years; 157 positive, 3,902 negative) was collected, with 278 (mean age 55 ± 13 years; 90 positive, 188 negative) evaluated for the reader study. We demonstrate an area under the curve =.84 (95% confidence interval [CI], 0.805-0.881) on the test set, with no significant difference to decisions made in clinical practice (P = .32). Compared with routine double reading, in the AISR arm, sensitivity improved by 7.6% (95% CI, 3.80-11.4; P = .00004) and specificity decreased 3.4% (1.42-5.43; P = .0016), with 71% (212/298) of scans no longer requiring input from a second reader. Variation in recall decision between reader pairs improved from a Cohen kappa of κ = .65 (96% CI, 0.61-0.68) to κ = .74 (96% CI, 0.71-0.77) in the AISR arm. View details
    Assistive AI in Lung Cancer Screening: A Retrospective Multinational Study in the United States and Japan
    Atilla Kiraly
    Corbin Cunningham
    Ryan Najafi
    Jie Yang
    Chuck Lau
    Diego Ardila
    Scott Mayer McKinney
    Rory Pilgrim
    Mozziyar Etemadi
    Sunny Jansen
    Lily Peng
    Shravya Shetty
    Neeral Beladia
    Krish Eswaran
    Radiology: Artificial Intelligence (2024)
    Preview abstract Lung cancer is the leading cause of cancer death world-wide with 1.8 million deaths in 20201. Studies have concluded that low-dose computed tomography lung cancer screening can reduce mortality by up to 61%2 and updated 2021 US guidelines expanded eligibility. As screening efforts rise, AI can play an important role, but must be unobtrusively integrated into existing clinical workflows. In this work, we introduce a state-of-the-art, cloud-based AI system providing lung cancer risk assessments without requiring any user input. We demonstrate its efficacy in assisting lung cancer screening under both US and Japanese screening settings using different patient populations and screening protocols. Technical improvements over a previously described system include a focus on earlier cancer detection for improved accuracy, introduction of an effective assistive user interface, and a system designed to integrate into typical clinical workflows. The stand-alone AI system was evaluated on 3085 individuals achieving area under the curve (AUC) scores of 91.7% (95%CI [89.6, 95.2]), 93.3% (95%CI [90.2, 95.7]), and 89.1% (95%CI [77.7, 97.3]) on three datasets (two from US and one from Japan), respectively. To evaluate the system’s assistive ability, we conducted two retrospective multi-reader multi-case studies on 627 cases read by experienced board certified radiologists (average 20 years of experience [7,40]) using local PACS systems in the respective US and Japanese screening settings. The studies measured the reader’s level of suspicion (LoS) and categorical responses for scores and management recommendations under country-specific screening protocols. The radiologists’ AUC for LoS increased with AI assistance by 2.3% (95%CI [0.1-4.5], p=0.022) for the US study and by 2.3% (95%CI [-3.5-8.1], p=0.179) for the Japan study. Specificity for recalls increased by 5.5% (95%CI [2.7-8.5], p<0.0001) for the US and 6.7% (95%CI [4.7-8.7], p<0.0001) for the Japan study. No significant reduction in other metrics occured. This work advances the state-of-the-art in lung cancer detection, introduces generalizable interface concepts that can be applicable to similar AI applications, and demonstrates its potential impact on diagnostic AI in global lung cancer screening with results suggesting a substantial drop in unnecessary follow-up procedures without impacting sensitivity. View details
    Clinically Applicable Segmentation of Head and Neck Anatomy for Radiotherapy: Deep Learning Algorithm Development and Validation Study
    Stanislav Nikolov
    Sam Blackwell
    Alexei Zverovitch
    Ruheena Mendes
    Michelle Livne
    Jeffrey De Fauw
    Yojan Patel
    Clemens Meyer
    Harry Askham
    Bernardino Romera Paredes
    Christopher Kelly
    Carlton Chu
    Dawn Carnell
    Cheng Boon
    Derek D'Souza
    Syed Moinuddin
    Yasmin Mcquinlan
    Sarah Ireland
    Kiarna Hampton
    Krystle Fuller
    Hugh Montgomery
    Geraint Rees
    Mustafa Suleyman
    Trevor John Back
    Cían Hughes
    Olaf Ronneberger
    JMIR (2021)
    Preview abstract Background: Over half a million individuals are diagnosed with head and neck cancer each year globally. Radiotherapy is an important curative treatment for this disease, but it requires manual time to delineate radiosensitive organs at risk. This planning process can delay treatment while also introducing interoperator variability, resulting in downstream radiation dose differences. Although auto-segmentation algorithms offer a potentially time-saving solution, the challenges in defining, quantifying, and achieving expert performance remain. Objective: Adopting a deep learning approach, we aim to demonstrate a 3D U-Net architecture that achieves expert-level performance in delineating 21 distinct head and neck organs at risk commonly segmented in clinical practice. Methods: The model was trained on a data set of 663 deidentified computed tomography scans acquired in routine clinical practice and with both segmentations taken from clinical practice and segmentations created by experienced radiographers as part of this research, all in accordance with consensus organ at risk definitions. Results: We demonstrated the model’s clinical applicability by assessing its performance on a test set of 21 computed tomography scans from clinical practice, each with 21 organs at risk segmented by 2 independent experts. We also introduced surface Dice similarity coefficient, a new metric for the comparison of organ delineation, to quantify the deviation between organ at risk surface contours rather than volumes, better reflecting the clinical task of correcting errors in automated organ segmentations. The model’s generalizability was then demonstrated on 2 distinct open-source data sets, reflecting different centers and countries to model training. Conclusions: Deep learning is an effective and clinically applicable technique for the segmentation of the head and neck anatomy for radiotherapy. With appropriate validation studies and regulatory approvals, this system could improve the efficiency, consistency, and safety of radiotherapy pathways. View details
    Use of deep learning to develop continuous-risk models for adverse event prediction from electronic health records
    Nenad Tomašev
    Sebastien Baur
    Anne Mottram
    Xavier Glorot
    Jack William Rae
    Michal Zielinski
    Harry Askham
    Andre Saraiva
    Valerio Magliulo
    Clemens Meyer
    Suman Venkatesh Ravuri
    Alistair Connell
    Cían Hughes
    Julien Cornebise
    Hugh Montgomery
    Geraint Rees
    Christopher Laing
    Clifton R. Baker
    Thomas Osborne
    Ruth Reeves
    Demis Hassabis
    Dominic King
    Mustafa Suleyman
    Trevor John Back
    Christopher Nielsen
    Martin Gamunu Seneviratne
    Shakir Mohamad
    Nature Protocols (2021)
    Preview abstract Early prediction of patient outcomes is important for targeting preventive care. This protocol describes a practical workflow for developing deep-learning risk models that can predict various clinical and operational outcomes from structured electronic health record (EHR) data. The protocol comprises five main stages: formal problem definition, data pre-processing, architecture selection, calibration and uncertainty, and generalizability evaluation. We have applied the workflow to four endpoints (acute kidney injury, mortality, length of stay and 30-day hospital readmission). The workflow can enable continuous (e.g., triggered every 6 h) and static (e.g., triggered at 24 h after admission) predictions. We also provide an open-source codebase that illustrates some key principles in EHR modeling. This protocol can be used by interdisciplinary teams with programming and clinical expertise to build deep-learning prediction models with alternate data sources and prediction tasks. View details
    A prospective evaluation of AI-augmented epidemiology to forecast COVID-19 in the USA and Japan
    Joel Shor
    Arkady Epshteyn
    Ashwin Sura Ravi
    Beth Luan
    Chun-Liang Li
    Daisuke Yoneoka
    Dario Sava
    Hiroaki Miyata
    Hiroki Kayama
    Isaac Jones
    Joe Mckenna
    Johan Euphrosine
    Kris Popendorf
    Nate Yoder
    Shashank Singh
    Shuhei Nomura
    Thomas Tsai
    npj Digital Medicine (2021)
    Preview abstract The COVID-19 pandemic has highlighted the global need for reliable models of disease spread. We evaluate an AI-improved forecasting approach that provides daily predictions of the expected number of confirmed COVID-19 deaths, cases and hospitalizations during the following 28 days. We present an international, prospective evaluation of model performance across all states and counties in the USA and prefectures in Japan. National mean absolute percentage error (MAPE) for predicting COVID-19 associated deaths before and after prospective deployment remained consistently <3% (US) and <10% (Japan). Average statewide (US) and prefecture wide (Japan) MAPE was 6% and 20% respectively (14% when looking at prefectures with more than 10 deaths).We show our model performs well even during periods of considerable change in population behavior, and that it is robust to demographic differences across different geographic locations.We further demonstrate the model provides meaningful explanatory insights, finding that the model appropriately responds to local and national policy interventions. Our model enables counterfactual simulations, which indicate continuing NPIs alongside vaccinations is essential for more rapidly recovering from the pandemic, delaying the application of interventions has a detrimental effect, and allow exploration of the consequences of different vaccination strategies. The COVID-19 pandemic remains a global emergency. In the face of substantial challenges ahead, the approach presented here has the potential to inform critical decisions. View details
    Google and DeepMind: Deep Learning Systems in Ophthalmology
    Xinle Liu
    Akinori Mitani
    Terry Spitz
    Derek Wu
    Artificial Intelligence in Ophthalmology (2021)
    Preview abstract Deep learning has a profound potential to improve patient outcomes. To achieve this, a holistic, patient-centered approach is crucial. In ophthalmology, artificial intelligence studies have spanned a diverse spectrum including algorithm development, human computer interaction, clinical validation, and novel biomarker discovery. In this chapter we highlight the work of Google and DeepMind in these areas, as a set of end-to-end case studies for developing and implementing artificial intelligence in clinical practice. View details
    Predicting OCT-derived DME grades from fundus photographs using deep learning
    Arunachalam Narayanaswamy
    Avinash Vaidyanathan Varadarajan
    Dr. Paisan Raumviboonsuk
    Dr. Peranut Chotcomwongse
    Jorge Cuadros
    Lily Hao Yi Peng
    Pearse Keane
    Nature Communications (2020)
    Preview abstract Diabetic eye disease is one of the fastest growing causes of preventable blindness. With the advent of anti-VEGF therapies, it has become increasingly important to detect center-involved DME (ci-DME). However, ci-DME is diagnosed using optical coherence tomography (OCT), which is not generally available at screening sites. Instead, screening programs rely on the detection of hard exudates as a proxy for DME on color fundus photographs, but this often results in a fair number of false positive and false negative calls. We trained a deep learning model to use color fundus images to directly predict grades derived from OCT exams for DME. Our OCT-based model had an AUC of 0.89 (95% CI: 0.87-0.91), which corresponds to a sensitivity of 85% at a specificity of 80%. In comparison, the ophthalmology graders had sensitivities ranging from 82%-85% and specificities ranging from 44%-50%. These metrics correspond to a PPV of 61% (95% CI: 56%-66%) for the OCT-based algorithm and a range of 36-38% (95% CI ranging from 33% -42%) for ophthalmologists. In addition, we used multiple attention techniques to explain how the model is making its prediction. The ability of deep learning algorithms to make clinically relevant predictions that generally requires sophisticated 3D-imaging equipment from simple 2D images has broad relevance to many other applications in medical imaging. View details
    Predicting conversion to wet age-related macular degeneration using deep learning
    Jason Yim
    Reena Chopra
    Terry Spitz
    Jim Winkens
    Annette Obika
    Christopher Kelly
    Harry Askham
    Marko Lukic
    Josef Huemer
    Katrin Fasler
    Gabriella Moraes
    Clemens Meyer
    Marc Wilson
    Jonathan Mark Dixon
    Cían Hughes
    Geraint Rees
    Peng Khaw
    Dominic King
    Demis Hassabis
    Mustafa Suleyman
    Trevor John Back
    Pearse Keane
    Jeffrey De Fauw
    Nature Medicine (2020)
    Preview abstract Progression to exudative ‘wet’ age-related macular degeneration (exAMD) is a major cause of visual deterioration. In patients diagnosed with exAMD in one eye, we introduce an artificial intelligence (AI) system to predict progression to exAMD in the second eye. By combining models based on three-dimensional (3D) optical coherence tomography images and corresponding automatic tissue maps, our system predicts conversion to exAMD within a clinically actionable 6-month time window, achieving a per-volumetric-scan sensitivity of 80% at 55% specificity, and 34% sensitivity at 90% specificity. This level of performance corresponds to true positives in 78% and 41% of individual eyes, and false positives in 56% and 17% of individual eyes at the high sensitivity and high specificity points, respectively. Moreover, we show that automatic tissue segmentation can identify anatomical changes before conversion and high-risk subgroups. This AI system overcomes substantial interobserver variability in expert predictions, performing better than five out of six experts, and demonstrates the potential of using AI to predict disease progression. View details
    International evaluation of an AI system for breast cancer screening
    Scott Mayer McKinney
    Varun Yatindra Godbole
    Jonathan Godwin
    Natasha Antropova
    Hutan Ashrafian
    Trevor John Back
    Mary Chesus
    Ara Darzi
    Mozziyar Etemadi
    Florencia Garcia-Vicente
    Fiona J Gilbert
    Mark D Halling-Brown
    Demis Hassabis
    Sunny Jansen
    Christopher Kelly
    Dominic King
    David Melnick
    Hormuz Mostofi
    Lily Hao Yi Peng
    Joshua Reicher
    Bernardino Romera Paredes
    Richard Sidebottom
    Mustafa Suleyman
    Kenneth C. Young
    Jeffrey De Fauw
    Shravya Ramesh Shetty
    Nature (2020)
    Preview abstract Screening mammography aims to identify breast cancer at earlier stages of the disease, when treatment can be more successful. Despite the existence of screening programmes worldwide, the interpretation of mammograms is affected by high rates of false positives and false negatives. Here we present an artificial intelligence (AI) system that is capable of surpassing human experts in breast cancer prediction. To assess its performance in the clinical setting, we curated a large representative dataset from the UK and a large enriched dataset from the USA. We show an absolute reduction of 5.7% and 1.2% (USA and UK) in false positives and 9.4% and 2.7% in false negatives. We provide evidence of the ability of the system to generalize from the UK to the USA. In an independent study of six radiologists, the AI system outperformed all of the human readers: the area under the receiver operating characteristic curve (AUC-ROC) for the AI system was greater than the AUC-ROC for the average radiologist by an absolute margin of 11.5%. We ran a simulation in which the AI system participated in the double-reading process that is used in the UK, and found that the AI system maintained non-inferior performance and reduced the workload of the second reader by 88%. This robust assessment of the AI system paves the way for clinical trials to improve the accuracy and efficiency of breast cancer screening. View details
    A clinically applicable approach to continuous prediction of future acute kidney injury
    Nenad Tomašev
    Xavier Glorot
    Jack W Rae
    Michal Zielinski
    Harry Askham
    Andre Saraiva
    Anne Mottram
    Clemens Meyer
    Suman Ravuri
    Alistair Connell
    Cían O Hughes
    Julien Cornebise
    Hugh Montgomery
    Geraint Rees
    Chris Laing
    Clifton R Baker
    Kelly Peterson
    Ruth Reeves
    Demis Hassabis
    Dominic King
    Mustafa Suleyman
    Trevor Back
    Christopher Nielson
    Shakir Mohamed
    Nature, 572 (2019), pp. 116-119
    Preview abstract The early prediction of deterioration could have an important role in supporting healthcare professionals, as an estimated 11% of deaths in hospital follow a failure to promptly recognize and treat deteriorating patients. To achieve this goal requires predictions of patient risk that are continuously updated and accurate, and delivered at an individual level with sufficient context and enough time to act. Here we develop a deep learning approach for the continuous risk prediction of future deterioration in patients, building on recent work that models adverse events from electronic health records and using acute kidney injury—a common and potentially life-threatening condition—as an exemplar. Our model was developed on a large, longitudinal dataset of electronic health records that cover diverse clinical environments, comprising 703,782 adult patients across 172 inpatient and 1,062 outpatient sites. Our model predicts 55.8% of all inpatient episodes of acute kidney injury, and 90.2% of all acute kidney injuries that required subsequent administration of dialysis, with a lead time of up to 48 h and a ratio of 2 false alerts for every true alert. In addition to predicting future acute kidney injury, our model provides confidence assessments and a list of the clinical features that are most salient to each prediction, alongside predicted future trajectories for clinically relevant blood tests. Although the recognition and prompt treatment of acute kidney injury is known to be challenging, our approach may offer opportunities for identifying patients at risk within a time window that enables early treatment. View details