- Diego Ardila
- Bokyung Choi
- Atilla Peter Kiraly
- Sujeeth Bharadwaj
- Joshua Reicher
- Greg Corrado
- Daniel Tse
- Lily Peng
- Shravya Ramesh Shetty
PURPOSE Evaluate the utility of deep learning to improve the specificity and sensitivity of lung cancer screening with low-dose helical computed tomography (LDCT), relative to the Lung-RADS guidelines.
METHOD AND MATERIALS We analyzed 42,943 CT studies from 14,863 patients, 620 of which developed biopsy-confirmed cancer. All cases were from the National Lung Screening Trial (NLST) study. We randomly split patients into a training (70%), tuning (15%) and test (15%) sets. A study was marked "true" if the patient was diagnosed with biopsy confirmed lung cancer in the same screening year as the study.A deep learning model was trained over 3D CT volumes (400x512x512) as input. We used the 95% specificity operating point based on the tuning set, and evaluated our approach on the test set. To estimate radiologist performance, we retrospectively applied Lung-RADS criteria to each study in the test set. Lung-RADS categories 1 to 2 constitute negative screening results, and categories 3 to 4 constitute positive results. Neither the model nor the Lung-RADS results took into account prior studies, but all screening years were utilized in evaluation.
RESULTS The area under the receiver operator curve of the deep learning model was 94.2% (95% CI 91.0, 96.9). Compared to Lung-RADS on the test set, the trained model achieved a statistically significant absolute 9.2% (95% CI 8.4, 10.1) higher specificity and trended a 3.4% (95% CI -5.2, 12.6) higher sensitivity (not statistically significant).Radiologists qualitatively reviewed disagreements between the model and Lung-RADS. Preliminary analysis suggests that the model may be superior in distinguishing scarring from early malignancy.
CONCLUSION A deep learning based model improved the specificity of lung cancer screening over Lung-RADS on the NLST dataset and could potentially help reduce unnecessary procedures. This research could supplement future versions of Lung-RADS; or support assisted read or second read workflows.
CLINICAL RELEVANCE/APPLICATION While Lung-RADS criteria is recommended for lung cancer screening with LDCT, there is still an opportunity to reduce false-positive rates which lead to unnecessary invasive procedures.