Longitudinal Screening for Diabetic Retinopathy in a Nationwide Screening Program: Comparing Deep Learning and Human Graders

Jirawut Limwattanayingyong; Variya Nganthavee; Kasem Seresirikachorn; Tassapol Singalavanija; Ngamphol Soonthornworasiri; Varis Ruamviboonsuk; Chetan Rao; Rajiv Raman; Andrzej Grzybowski; Mike Schaekermann; Lily Hao Yi Peng; Dale Richard Webster; Christopher 'sec' Semturs; Jonathan David Krause; Rory Abbott Sayres; Fred Hersch; Richa Tiwari, PhD; Yun Liu; Dr. Paisan Raumviboonsuk

Longitudinal Screening for Diabetic Retinopathy in a Nationwide Screening Program: Comparing Deep Learning and Human Graders

Jirawut Limwattanayingyong

Variya Nganthavee

Kasem Seresirikachorn

Tassapol Singalavanija

Ngamphol Soonthornworasiri

Varis Ruamviboonsuk

Chetan Rao

Rajiv Raman

Andrzej Grzybowski

Mike Schaekermann

Lily Hao Yi Peng

Dale Richard Webster

Christopher 'sec' Semturs

Jonathan David Krause

Rory Abbott Sayres

Fred Hersch

Richa Tiwari, PhD

Yun Liu

Dr. Paisan Raumviboonsuk

Journal of Diabetes Research (2020)

Download Google Scholar

Abstract

Objective.
To evaluate diabetic retinopathy (DR) screening via deep learning (DL) and trained human graders (HG) in a longitudinal cohort, as case spectrum shifts based on treatment referral and new-onset DR.

Methods.
We randomly selected patients with diabetes screened twice, two years apart within a nationwide screening program. The reference standard was established via adjudication by retina specialists. Each patient’s color fundus photographs were graded, and a patient was considered as having sight-threatening DR (STDR) if the worse eye had severe nonproliferative DR, proliferative DR, or diabetic macular edema. We compared DR screening via two modalities: DL and HG. For each modality, we simulated treatment referral by excluding patients with detected STDR from the second screening using that modality.

Results.
There were 5,738 patients (12.3% STDR) in the first screening. DL and HG captured different numbers of STDR cases, and after simulated referral and excluding ungradable cases, 4,148 and 4,263 patients remained in the second screening, respectively. The STDR prevalence at the second screening was 5.1% and 6.8% for DL- and HG-based screening, respectively. Along with the prevalence decrease, the sensitivity for both modalities decreased from the first to the second screening (DL: from 95% to 90%, p=0.008; HG: from 74% to 57%, p<0.001). At both the first and second screenings, the rate of false negatives for the DL was a fifth that of HG (0.5-0.6% vs. 2.9-3.2%).

Conclusion.
On 2-year longitudinal follow-up of a DR screening cohort, STDR prevalence decreased for both DL- and HG-based screening. Follow-up screenings in longitudinal DR screening can be more difficult and induce lower sensitivity for both DL and HG, though the false negative rate was substantially lower for DL. Our data may be useful for health-economics analyses of longitudinal screening settings.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Longitudinal Screening for Diabetic Retinopathy in a Nationwide Screening Program: Comparing Deep Learning and Human Graders

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs