Jump to Content

Performance of a Diabetic Retinopathy Artificial Intelligence Algorithm for Ultra-widefield Imaging

Tunde Peto
Lloyd Aiello
Srinivas R Sadda
Drew Lewis
Anne Marie Cairns
Dana Keane
Sunny Virmani
Jerry Cavallerano
Barba Hamill
Lily Peng
Sara Ellen Godek
Lu Yang
Naho Kitade
Kira Whitehouse
ARVO (2022)
Google Scholar

Abstract

Purpose: To evaluate the performance of a deep learning model for diabetic retinopathy (DR) and diabetic macular edema screening when using ultra-widefield (UWF) imaging. Methods: For model development, 67,200 UWF images were collected from DR programs and ophthalmology clinics worldwide. 30,836 images were double graded and adjudicated at 8 grading centres by 125 certified graders using ETDRS extension of the Modified Airlie House Classification of Diabetic Retinopathy following the JVN Clinical Trial Ultrawide Field Grading Manual v1.0. The grading system used traditional ETDRS 7-SF field definition as well as extended fields 3-7 to evaluate the retinal periphery. A further 36,364 UWF images were graded using a grading protocol based on the ICDR classification. The dataset was split into training, tuning and testing. The final DR model is an ensemble of 10 EfficientNet-b0 neural networks, independently trained with standard image augmentation techniques. For model validation, two independent sets of images were collected. Model performance was evaluated by comparing its predictions to the adjudicated ground truth for both sets of images. Results: Prior to clinical validation, the model performance was internally evaluated on an independent set of 1967 images, of which 1050 were graded via adjudication as negative for more than mild diabetic retinopathy (mtmDR negative), and 917 as having referable diabetic retinopathy (mtmDR positive). The overall performance (Table 1) was weighted by target DR distribution. Clinical validation evaluated an independent data set of 420 images selected to achieve a target distribution that enabled appropriate confidence intervals for mtmDR sensitivity and specificity A panel of three graders adjudicated these 420 images and assessed 241 as mtmDR negative, 179 as mtmDR positive and 135 as vtDR positive. Model’s performance on the clinical validation set is shown in Table 2. Conclusions: The deep learning model was developed with high quality graded UWF images and performed at a level that highly suggests usefulness in a clinical screening setting. A large, prospective multi-center clinical trial is currently evaluating the performance of a similar model in a real-world clinical setting. This abstract was presented at the 2022 ARVO Annual Meeting, held in Denver, CO, May 1-4, 2022, and virtually.

Research Areas