Jump to Content

Deep Learning Detection of Active Pulmonary Tuberculosis at Chest Radiography Matched the Clinical Performance of Radiologists

Sahar Kazemzadeh
Jin Yu
Shahar Jamshy
Rory Pilgrim
Christina Chen
Neeral Beladia
Chuck Lau
Scott Mayer McKinney
Thad Hughes
Atilla Peter Kiraly
Sreenivasa Raju Kalidindi
Monde Muyoyeta
Jameson Malemela
Ting Shih
Lily Hao Yi Peng
Kat Chou
Cameron Chen
Shravya Ramesh Shetty
Radiology (2022)

Abstract

Background: The World Health Organization (WHO) recommends chest radiography to facilitate tuberculosis (TB) screening. However, chest radiograph interpretation expertise remains limited in many regions. Purpose: To develop a deep learning system (DLS) to detect active pulmonary TB on chest radiographs and compare its performance to that of radiologists. Materials and Methods: A DLS was trained and tested using retrospective chest radiographs (acquired between 1996 and 2020) from 10 countries. To improve generalization, large-scale chest radiograph pretraining, attention pooling, and semisupervised learning (“noisy-student”) were incorporated. The DLS was evaluated in a four-country test set (China, India, the United States, and Zambia) and in a mining population in South Africa, with positive TB confirmed with microbiological tests or nucleic acid amplification testing (NAAT). The performance of the DLS was compared with that of 14 radiologists. The authors studied the efficacy of the DLS compared with that of nine radiologists using the Obuchowski-Rockette-Hillis procedure. Given WHO targets of 90% sensitivity and 70% specificity, the operating point of the DLS (0.45) was prespecified to favor sensitivity. Results: A total of 165 754 images in 22 284 subjects (mean age, 45 years; 21% female) were used for model development and testing. In the four-country test set (1236 subjects, 17% with active TB), the receiver operating characteristic (ROC) curve of the DLS was higher than those for all nine India-based radiologists, with an area under the ROC curve of 0.89 (95% CI: 0.87, 0.91). Compared with these radiologists, at the prespecified operating point, the DLS sensitivity was higher (88% vs 75%, P < .001) and specificity was noninferior (79% vs 84%, P = .004). Trends were similar within other patient subgroups, in the South Africa data set, and across various TB-specific chest radiograph findings. In simulations, the use of the DLS to identify likely TB-positive chest radiographs for NAAT confirmation reduced the cost by 40%–80% per TB-positive patient detected. Conclusion: A deep learning method was found to be noninferior to radiologists for the determination of active tuberculosis on digital chest radiographs.