Joe Ledsam
Joe is clinician scientist in Google Japan focusing on the application of AI to health and science. Prior to joining Google Japan, Joe spent four years in DeepMind leading multiple research projects across medical imaging and electronic health records, as well as founding the DeepMind Genomics team. Joe remains an active collaborator with Google Health, DeepMind and other research groups throughout Google. He obtained his medical degree from The University of Leeds, UK, and was a research fellow at University College London during his years in clinical practice.
Research Areas
Authored Publications
Sort By
Assistive AI in Lung Cancer Screening: A Retrospective Multinational Study in the United States and Japan
Atilla Kiraly
Corbin Cunningham
Ryan Najafi
Jie Yang
Chuck Lau
Diego Ardila
Scott Mayer McKinney
Rory Pilgrim
Mozziyar Etemadi
Sunny Jansen
Lily Peng
Shravya Shetty
Neeral Beladia
Krish Eswaran
Radiology: Artificial Intelligence (2024)
Preview abstract
Lung cancer is the leading cause of cancer death world-wide with 1.8 million deaths in 20201. Studies have concluded that low-dose computed tomography lung cancer screening can reduce mortality by up to 61%2 and updated 2021 US guidelines expanded eligibility. As screening efforts rise, AI can play an important role, but must be unobtrusively integrated into existing clinical workflows. In this work, we introduce a state-of-the-art, cloud-based AI system providing lung cancer risk assessments without requiring any user input. We demonstrate its efficacy in assisting lung cancer screening under both US and Japanese screening settings using different patient populations and screening protocols. Technical improvements over a previously described system include a focus on earlier cancer detection for improved accuracy, introduction of an effective assistive user interface, and a system designed to integrate into typical clinical workflows. The stand-alone AI system was evaluated on 3085 individuals achieving area under the curve (AUC) scores of 91.7% (95%CI [89.6, 95.2]), 93.3% (95%CI [90.2, 95.7]), and 89.1% (95%CI [77.7, 97.3]) on three datasets (two from US and one from Japan), respectively. To evaluate the system’s assistive ability, we conducted two retrospective multi-reader multi-case studies on 627 cases read by experienced board certified radiologists (average 20 years of experience [7,40]) using local PACS systems in the respective US and Japanese screening settings. The studies measured the reader’s level of suspicion (LoS) and categorical responses for scores and management recommendations under country-specific screening protocols. The radiologists’ AUC for LoS increased with AI assistance by 2.3% (95%CI [0.1-4.5], p=0.022) for the US study and by 2.3% (95%CI [-3.5-8.1], p=0.179) for the Japan study. Specificity for recalls increased by 5.5% (95%CI [2.7-8.5], p<0.0001) for the US and 6.7% (95%CI [4.7-8.7], p<0.0001) for the Japan study. No significant reduction in other metrics occured. This work advances the state-of-the-art in lung cancer detection, introduces generalizable interface concepts that can be applicable to similar AI applications, and demonstrates its potential impact on diagnostic AI in global lung cancer screening with results suggesting a substantial drop in unnecessary follow-up procedures without impacting sensitivity.
View details
Artificial intelligence as a second reader for screening mammography
Etsuji Nakai
Alessandro Scoccia Pappagallo
Hiroki Kayama
Lin Yang
Shawn Xu
Christopher Kelly
Timo Kohlberger
Daniel Golden
Akib Uddin
Radiology Advances, 1(2) (2024)
Preview abstract
Background
Artificial intelligence (AI) has shown promise in mammography interpretation, and its use as a second reader in breast cancer screening may reduce the burden on health care systems.
Purpose
To evaluate the performance differences between routine double read and an AI as a second reader workflow (AISR), where the second reader is replaced with AI.
Materials and Methods
A cohort of patients undergoing routine breast cancer screening at a single center with mammography was retrospectively collected between 2005 and 2021. A model developed on US and UK data was fine-tuned on Japanese data. We subsequently performed a reader study with 10 qualified readers with varied experience (5 reader pairs), comparing routine double read to an AISR workflow.
Results
A “test set” of 4,059 women (mean age, 56 ± 14 years; 157 positive, 3,902 negative) was collected, with 278 (mean age 55 ± 13 years; 90 positive, 188 negative) evaluated for the reader study. We demonstrate an area under the curve =.84 (95% confidence interval [CI], 0.805-0.881) on the test set, with no significant difference to decisions made in clinical practice (P = .32). Compared with routine double reading, in the AISR arm, sensitivity improved by 7.6% (95% CI, 3.80-11.4; P = .00004) and specificity decreased 3.4% (1.42-5.43; P = .0016), with 71% (212/298) of scans no longer requiring input from a second reader. Variation in recall decision between reader pairs improved from a Cohen kappa of κ = .65 (96% CI, 0.61-0.68) to κ = .74 (96% CI, 0.71-0.77) in the AISR arm.
View details
Google and DeepMind: Deep Learning Systems in Ophthalmology
Xinle Liu
Akinori Mitani
Terry Spitz
Derek Wu
Artificial Intelligence in Ophthalmology (2021)
Preview abstract
Deep learning has a profound potential to improve patient outcomes. To achieve this, a holistic, patient-centered approach is crucial. In ophthalmology, artificial intelligence studies have spanned a diverse spectrum including algorithm development, human computer interaction, clinical validation, and novel biomarker discovery. In this chapter we highlight the work of Google and DeepMind in these areas, as a set of end-to-end case studies for developing and implementing artificial intelligence in clinical practice.
View details
Clinically Applicable Segmentation of Head and Neck Anatomy for Radiotherapy: Deep Learning Algorithm Development and Validation Study
Stanislav Nikolov
Sam Blackwell
Alexei Zverovitch
Ruheena Mendes
Michelle Livne
Jeffrey De Fauw
Yojan Patel
Clemens Meyer
Harry Askham
Bernardino Romera Paredes
Christopher Kelly
Carlton Chu
Dawn Carnell
Cheng Boon
Derek D'Souza
Syed Moinuddin
Yasmin Mcquinlan
Sarah Ireland
Kiarna Hampton
Krystle Fuller
Hugh Montgomery
Geraint Rees
Mustafa Suleyman
Trevor John Back
Cían Hughes
Olaf Ronneberger
JMIR (2021)
Preview abstract
Background:
Over half a million individuals are diagnosed with head and neck cancer each year globally. Radiotherapy is an important curative treatment for this disease, but it requires manual time to delineate radiosensitive organs at risk. This planning process can delay treatment while also introducing interoperator variability, resulting in downstream radiation dose differences. Although auto-segmentation algorithms offer a potentially time-saving solution, the challenges in defining, quantifying, and achieving expert performance remain.
Objective:
Adopting a deep learning approach, we aim to demonstrate a 3D U-Net architecture that achieves expert-level performance in delineating 21 distinct head and neck organs at risk commonly segmented in clinical practice.
Methods:
The model was trained on a data set of 663 deidentified computed tomography scans acquired in routine clinical practice and with both segmentations taken from clinical practice and segmentations created by experienced radiographers as part of this research, all in accordance with consensus organ at risk definitions.
Results:
We demonstrated the model’s clinical applicability by assessing its performance on a test set of 21 computed tomography scans from clinical practice, each with 21 organs at risk segmented by 2 independent experts. We also introduced surface Dice similarity coefficient, a new metric for the comparison of organ delineation, to quantify the deviation between organ at risk surface contours rather than volumes, better reflecting the clinical task of correcting errors in automated organ segmentations. The model’s generalizability was then demonstrated on 2 distinct open-source data sets, reflecting different centers and countries to model training.
Conclusions:
Deep learning is an effective and clinically applicable technique for the segmentation of the head and neck anatomy for radiotherapy. With appropriate validation studies and regulatory approvals, this system could improve the efficiency, consistency, and safety of radiotherapy pathways.
View details
Use of deep learning to develop continuous-risk models for adverse event prediction from electronic health records
Nenad Tomašev
Sebastien Baur
Anne Mottram
Xavier Glorot
Jack William Rae
Michal Zielinski
Harry Askham
Andre Saraiva
Valerio Magliulo
Clemens Meyer
Suman Venkatesh Ravuri
Alistair Connell
Cían Hughes
Julien Cornebise
Hugh Montgomery
Geraint Rees
Christopher Laing
Clifton R. Baker
Thomas Osborne
Ruth Reeves
Demis Hassabis
Dominic King
Mustafa Suleyman
Trevor John Back
Christopher Nielsen
Martin Gamunu Seneviratne
Shakir Mohamad
Nature Protocols (2021)
Preview abstract
Early prediction of patient outcomes is important for targeting preventive care. This protocol describes a practical workflow for developing deep-learning risk models that can predict various clinical and operational outcomes from structured electronic health record (EHR) data. The protocol comprises five main stages: formal problem definition, data pre-processing, architecture selection, calibration and uncertainty, and generalizability evaluation. We have applied the workflow to four endpoints (acute kidney injury, mortality, length of stay and 30-day hospital readmission). The workflow can enable continuous (e.g., triggered every 6 h) and static (e.g., triggered at 24 h after admission) predictions. We also provide an open-source codebase that illustrates some key principles in EHR modeling. This protocol can be used by interdisciplinary teams with programming and clinical expertise to build deep-learning prediction models with alternate data sources and prediction tasks.
View details
A prospective evaluation of AI-augmented epidemiology to forecast COVID-19 in the USA and Japan
Joel Shor
Arkady Epshteyn
Ashwin Sura Ravi
Beth Luan
Chun-Liang Li
Daisuke Yoneoka
Dario Sava
Hiroaki Miyata
Hiroki Kayama
Isaac Jones
Joe Mckenna
Johan Euphrosine
Kris Popendorf
Nate Yoder
Shashank Singh
Shuhei Nomura
Thomas Tsai
npj Digital Medicine (2021)
Preview abstract
The COVID-19 pandemic has highlighted the global need for reliable models of disease spread. We evaluate an AI-improved forecasting approach that provides daily predictions of the expected number of confirmed COVID-19 deaths, cases and hospitalizations during the following 28 days. We present an international, prospective evaluation of model performance across all states and counties in the USA and prefectures in Japan. National mean absolute percentage error (MAPE) for predicting COVID-19 associated deaths before and after prospective deployment remained consistently <3% (US) and <10% (Japan). Average statewide (US) and prefecture wide (Japan) MAPE was 6% and 20% respectively (14% when looking at prefectures with more than 10 deaths).We show our model performs well even during periods of considerable change in population behavior, and that it is robust to demographic differences across different geographic locations.We further demonstrate the model provides meaningful explanatory insights, finding that the model appropriately responds to local and national policy interventions. Our model enables counterfactual simulations, which indicate continuing NPIs alongside vaccinations is essential for more rapidly recovering from the pandemic, delaying the application of interventions has a detrimental effect, and allow exploration of the consequences of different vaccination strategies. The COVID-19 pandemic remains a global emergency. In the face of substantial challenges ahead, the approach presented here has the potential to inform critical decisions.
View details
Predicting OCT-derived DME grades from fundus photographs using deep learning
Arunachalam Narayanaswamy
Avinash Vaidyanathan Varadarajan
Dr. Paisan Raumviboonsuk
Dr. Peranut Chotcomwongse
Jorge Cuadros
Lily Hao Yi Peng
Pearse Keane
Nature Communications (2020)
Preview abstract
Diabetic eye disease is one of the fastest growing causes of preventable blindness. With the advent of anti-VEGF therapies, it has become increasingly important to detect center-involved DME (ci-DME). However, ci-DME is diagnosed using optical coherence tomography (OCT), which is not generally available at screening sites. Instead, screening programs rely on the detection of hard exudates as a proxy for DME on color fundus photographs, but this often results in a fair number of false positive and false negative calls. We trained a deep learning model to use color fundus images to directly predict grades derived from OCT exams for DME. Our OCT-based model had an AUC of 0.89 (95% CI: 0.87-0.91), which corresponds to a sensitivity of 85% at a specificity of 80%. In comparison, the ophthalmology graders had sensitivities ranging from 82%-85% and specificities ranging from 44%-50%. These metrics correspond to a PPV of 61% (95% CI: 56%-66%) for the OCT-based algorithm and a range of 36-38% (95% CI ranging from 33% -42%) for ophthalmologists. In addition, we used multiple attention techniques to explain how the model is making its prediction. The ability of deep learning algorithms to make clinically relevant predictions that generally requires sophisticated 3D-imaging equipment from simple 2D images has broad relevance to many other applications in medical imaging.
View details
Predicting conversion to wet age-related macular degeneration using deep learning
Jason Yim
Reena Chopra
Terry Spitz
Jim Winkens
Annette Obika
Christopher Kelly
Harry Askham
Marko Lukic
Josef Huemer
Katrin Fasler
Gabriella Moraes
Clemens Meyer
Marc Wilson
Jonathan Mark Dixon
Cían Hughes
Geraint Rees
Peng Khaw
Dominic King
Demis Hassabis
Mustafa Suleyman
Trevor John Back
Pearse Keane
Jeffrey De Fauw
Nature Medicine (2020)
Preview abstract
Progression to exudative ‘wet’ age-related macular degeneration (exAMD) is a major cause of visual deterioration. In patients diagnosed with exAMD in one eye, we introduce an artificial intelligence (AI) system to predict progression to exAMD in the second eye. By combining models based on three-dimensional (3D) optical coherence tomography images and corresponding automatic tissue maps, our system predicts conversion to exAMD within a clinically actionable 6-month time window, achieving a per-volumetric-scan sensitivity of 80% at 55% specificity, and 34% sensitivity at 90% specificity. This level of performance corresponds to true positives in 78% and 41% of individual eyes, and false positives in 56% and 17% of individual eyes at the high sensitivity and high specificity points, respectively. Moreover, we show that automatic tissue segmentation can identify anatomical changes before conversion and high-risk subgroups. This AI system overcomes substantial interobserver variability in expert predictions, performing better than five out of six experts, and demonstrates the potential of using AI to predict disease progression.
View details
International evaluation of an AI system for breast cancer screening
Scott Mayer McKinney
Varun Yatindra Godbole
Jonathan Godwin
Natasha Antropova
Hutan Ashrafian
Trevor John Back
Mary Chesus
Ara Darzi
Mozziyar Etemadi
Florencia Garcia-Vicente
Fiona J Gilbert
Mark D Halling-Brown
Demis Hassabis
Sunny Jansen
Christopher Kelly
Dominic King
David Melnick
Hormuz Mostofi
Lily Hao Yi Peng
Joshua Reicher
Bernardino Romera Paredes
Richard Sidebottom
Mustafa Suleyman
Kenneth C. Young
Jeffrey De Fauw
Shravya Ramesh Shetty
Nature (2020)
Preview abstract
Screening mammography aims to identify breast cancer at earlier stages of the disease, when treatment can be more successful. Despite the existence of screening programmes worldwide, the interpretation of mammograms is affected by high rates of false positives and false negatives. Here we present an artificial intelligence (AI) system that is capable of surpassing human experts in breast cancer prediction. To assess its performance in the clinical setting, we curated a large representative dataset from the UK and a large enriched dataset from the USA. We show an absolute reduction of 5.7% and 1.2% (USA and UK) in false positives and 9.4% and 2.7% in false negatives. We provide evidence of the ability of the system to generalize from the UK to the USA. In an independent study of six radiologists, the AI system outperformed all of the human readers: the area under the receiver operating characteristic curve (AUC-ROC) for the AI system was greater than the AUC-ROC for the average radiologist by an absolute margin of 11.5%. We ran a simulation in which the AI system participated in the double-reading process that is used in the UK, and found that the AI system maintained non-inferior performance and reduced the workload of the second reader by 88%. This robust assessment of the AI system paves the way for clinical trials to improve the accuracy and efficiency of breast cancer screening.
View details
A clinically applicable approach to continuous prediction of future acute kidney injury
Nenad Tomašev
Xavier Glorot
Jack W Rae
Michal Zielinski
Harry Askham
Andre Saraiva
Anne Mottram
Clemens Meyer
Suman Ravuri
Alistair Connell
Cían O Hughes
Julien Cornebise
Hugh Montgomery
Geraint Rees
Chris Laing
Clifton R Baker
Kelly Peterson
Ruth Reeves
Demis Hassabis
Dominic King
Mustafa Suleyman
Trevor Back
Christopher Nielson
Shakir Mohamed
Nature, 572 (2019), pp. 116-119
Preview abstract
The early prediction of deterioration could have an important role in supporting healthcare professionals, as an estimated 11% of deaths in hospital follow a failure to promptly recognize and treat deteriorating patients. To achieve this goal requires predictions of patient risk that are continuously updated and accurate, and delivered at an individual level with sufficient context and enough time to act. Here we develop a deep learning approach for the continuous risk prediction of future deterioration in patients, building on recent work that models adverse events from electronic health records and using acute kidney injury—a common and potentially life-threatening condition—as an exemplar. Our model was developed on a large, longitudinal dataset of electronic health records that cover diverse clinical environments, comprising 703,782 adult patients across 172 inpatient and 1,062 outpatient sites. Our model predicts 55.8% of all inpatient episodes of acute kidney injury, and 90.2% of all acute kidney injuries that required subsequent administration of dialysis, with a lead time of up to 48 h and a ratio of 2 false alerts for every true alert. In addition to predicting future acute kidney injury, our model provides confidence assessments and a list of the clinical features that are most salient to each prediction, alongside predicted future trajectories for clinically relevant blood tests. Although the recognition and prompt treatment of acute kidney injury is known to be challenging, our approach may offer opportunities for identifying patients at risk within a time window that enables early treatment.
View details