Deep learning to detect optical coherence tomography-derived diabetic macular edema from retinal photographs: a multicenter validation study

Xinle Sheila Liu
Tayyeba Ali
Ami Shah
Scott Mayer McKinney
Paisan Ruamviboonsuk
Angus W. Turner
Pearse A. Keane
Peranut Chotcomwongse
Variya Nganthavee
Mark Chia
Josef Huemer
Jorge Cuadros
Rajiv Raman
Lily Hao Yi Peng
Avinash Vaidyanathan Varadarajan
Reena Chopra
Ophthalmology Retina (2022)

Abstract

Purpose
To validate the generalizability of a deep learning system (DLS) that detects diabetic macular edema (DME) from two-dimensional color fundus photography (CFP), where the reference standard for retinal
thickness and fluid presence is derived from three-dimensional optical coherence tomography (OCT).

Design
Retrospective validation of a DLS across international datasets.

Participants
Paired CFP and OCT of patients from diabetic retinopathy (DR) screening programs or retina clinics. The DLS was developed using datasets from Thailand, the United Kingdom (UK) and the United States and validated using 3,060 unique eyes from 1,582 patients across screening populations in Australia, India and Thailand. The DLS was separately validated in 698 eyes from 537 screened patients in the UK with mild DR and suspicion of DME based on CFP.

Methods
The DLS was trained using DME labels from OCT. Presence of DME was based on retinal thickening or intraretinal fluid. The DLS’s performance was compared to expert grades of maculopathy and to a previous proof-of-concept version of the DLS. We further simulated integration of the current DLS into an algorithm trained to detect DR from CFPs.

Main Outcome Measures
Superiority of specificity and non-inferiority of sensitivity of the DLS for the detection of center-involving DME, using device specific thresholds, compared to experts.

Results
Primary analysis in a combined dataset spanning Australia, India, and Thailand showed the DLS had 80% specificity and 81% sensitivity compared to expert graders who had 59% specificity and 70% sensitivity. Relative to human experts, the DLS had significantly higher specificity (p=0.008) and non-inferior sensitivity (p<0.001). In the UK dataset the DLS had a specificity of 80% (p<0.001 for specificity > 50%) and a sensitivity of 100% (p=0.02 for sensitivity > 90%).

Conclusions
The DLS can generalize to multiple international populations with an accuracy exceeding experts. The clinical value of this DLS to reduce false positive referrals, thus decreasing the burden on specialist eye care, warrants prospective evaluation.