Using a deep learning algorithm and integrated gradient explanation to assist grading for diabetic retinopathy

Ankur Taly

Anthony Joseph

Arjun Sood

Arun Narayanaswamy

Dale Webster

David Devoud Coz

Derek Wu

Ehsan Rahimy

Greg Corrado

Jesse Smith

Jonathan Krause

Katy Blumer

Lily Peng

Michael Shumski

Naama Hammel

Rory Abbott Sayres

Scott Barb

Zahra Rastegar

Ophthalmology (2019)

Google Scholar

Abstract

Background Deep learning methods have recently produced algorithms that can detect disease such as diabetic retinopathy (DR) with doctor-level accuracy. We sought to understand the impact of these models on physician graders in assisted-read settings.

Methods We surfaced model predictions and explanation maps ("masks") to 9 ophthalmologists with varying levels of experience to read 1,804 images each for DR severity based on the International Clinical Diabetic Retinopathy (ICDR) disease severity scale. The image sample was representative of the diabetic screening population, and was adjudicated by 3 retina specialists for a reference standard. Doctors read each image in one of 3 conditions: Unassisted, Grades Only, or Grades+Masks.

Findings Readers graded DR more accurately with model assistance than without (p < 0.001, logistic regression). Compared to the adjudicated reference standard, for cases with disease, 5-class accuracy was 57.5% for the model. For graders, 5-class accuracy for cases with disease was 47.5 ± 5.6% unassisted, 56.9 ± 5.5% with Grades Only, and 61.5 ± 5.5% with Grades+Mask. Reader performance improved with assistance across all levels of DR, including for severe and proliferative DR. Model assistance increased the accuracy of retina fellows and trainees above that of the unassisted grader or model alone. Doctors’ grading confidence scores and read times both increased overall with assistance. For most cases, Grades + Masks was as only effective as Grades Only, though masks provided additional benefit over grades alone in cases with: some DR and low model certainty; low image quality; and proliferative diabetic retinopathy (PDR) with features that were frequently missed, such as panretinal photocoagulation (PRP) scars.

Interpretation Taken together, these results show that deep learning models can improve the accuracy of, and confidence in, DR diagnosis in an assisted read setting.

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Using a deep learning algorithm and integrated gradient explanation to assist grading for diabetic retinopathy

Abstract

Research Areas

Learn more about how we conduct our research