Detection of Elusive Polyps via a Large Scale AI System
Abstract
Colorectal cancer (CRC) is the second leading cause of cancer death worldwide resulting in an estimated 900,000 deaths per year. Colonoscopy is the gold standard for detection and removal of precancerous lesions, and has been amply shown to reduce mortality. However, the miss rate for polyps during colonoscopies is 22-28%, while 20-24% of the missed lesions are histologically confirmed adenomas. To address this shortcoming, we propose a polyp detection system based on deep learning, which can alert the operator in real-time to the presence and location of polyps during a colonoscopy. We dub the system DEEP^2: DEEP DEtection of ElusivePolyps. The DEEP^2 system was trained on 3,611 hours of colonoscopy videos derived from two sources, and was validated on a set comprising 1,393 hours of video, coming from a third, unrelated source. For the validation set, the ground truth labelling was provided by offline GI annotators, who were able to watch the video in slow-motion and pause/rewind as required; two or three such annotators examined each video.
Overall, DEEP^2 achieves a sensitivity of 96.8% at 4.9 false alarms per video, which improves substantially on the current state of the art. These results are attained using a neural network architecture which is designed to provide fast computations, and can therefore run in real-time at greater than 30 frames per second. We further analyze the data by examining its performance on elusive polyps, those polyps which are particularly difficult for endoscopists to detect. First, we show that on fast polyps that are in the field of view for less than 5 seconds, DEEP^2 attains a sensitivity of 88.5%, compared to a sensitivity of 31.7% for the endoscopists performing the procedure. On even shorter duration polyps, those that are in the field of view for less than 2 seconds, the difference is even starker: DEEP^2 attains a sensitivity of 84.9% vs. 18.9% for the endoscopists. Second, we examine procedures which are apparently clean, in that no polyps are detected by either the performing endoscopist or the offline annotators. In these sequences, DEEP^2 is able to detect polyps -- not seen by either live endoscopists or offline annotators -- which were later verified to be real polyps: an average of 0.22 polyps per sequence, of which 0.10 are adenomas. Finally, a preliminary small clinical validation indicates that the system will be useful in practice: on 32 procedures, DEEP^2 discovered an average of 1.06 polyps per procedure that would have otherwise been missed by the GI performing the procedure. Future work will be needed to measure the clinical impact on a larger scale.
Overall, DEEP^2 achieves a sensitivity of 96.8% at 4.9 false alarms per video, which improves substantially on the current state of the art. These results are attained using a neural network architecture which is designed to provide fast computations, and can therefore run in real-time at greater than 30 frames per second. We further analyze the data by examining its performance on elusive polyps, those polyps which are particularly difficult for endoscopists to detect. First, we show that on fast polyps that are in the field of view for less than 5 seconds, DEEP^2 attains a sensitivity of 88.5%, compared to a sensitivity of 31.7% for the endoscopists performing the procedure. On even shorter duration polyps, those that are in the field of view for less than 2 seconds, the difference is even starker: DEEP^2 attains a sensitivity of 84.9% vs. 18.9% for the endoscopists. Second, we examine procedures which are apparently clean, in that no polyps are detected by either the performing endoscopist or the offline annotators. In these sequences, DEEP^2 is able to detect polyps -- not seen by either live endoscopists or offline annotators -- which were later verified to be real polyps: an average of 0.22 polyps per sequence, of which 0.10 are adenomas. Finally, a preliminary small clinical validation indicates that the system will be useful in practice: on 32 procedures, DEEP^2 discovered an average of 1.06 polyps per procedure that would have otherwise been missed by the GI performing the procedure. Future work will be needed to measure the clinical impact on a larger scale.