Jump to Content

David Steiner

Health/Pathology team. Integration of molecular, laboratory, and imaging data. Background in molecular biology and molecular diagnostic test development, including analytical and clinical validation.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Task-specific deep learning models in histopathology offer promising opportunities for improving diagnosis, clinical research, and precision medicine. However, development of such models is often limited by availability of high-quality data. Foundation models in histopathology that learn general representations across a wide range of tissue types, diagnoses, and magnifications offer the potential to reduce the data, compute, and technical expertise necessary to develop task-specific deep learning models with the required level of model performance. In this work, we describe the development and evaluation of foundation models for histopathology via self-supervised learning (SSL). We first establish a diverse set of benchmark tasks involving 17 unique tissue types and 12 unique cancer types and spanning different optimal magnifications and task types. Next, we use this benchmark to explore and evaluate histopathology-specific SSL methods followed by further evaluation on held out patch-level and weakly supervised tasks. We found that standard SSL methods thoughtfully applied to histopathology images are performant across our benchmark tasks and that domain-specific methodological improvements can further increase performance. Our findings reinforce the value of using domain-specific SSL methods in pathology, and establish a set of high quality foundation models to enable further research across diverse applications. View details
    Predicting lymph node metastasis from primary tumor histology and clinicopathologic factors in colorectal cancer using deep learning
    Fraser Tan
    Isabelle Flament-Auvigne
    Trissia Brown
    Markus Plass
    Robert Reihs
    Heimo Mueller
    Kurt Zatloukal
    Pema Richeson
    Lily Peng
    Craig Mermel
    Cameron Chen
    Saurabh Gombar
    Thomas Montine
    Jeanne Shen
    Nature Communications Medicine, vol. 3 (2023), pp. 59
    Preview abstract Background: Presence of lymph node metastasis (LNM) influences prognosis and clinical decision-making in colorectal cancer. However, detection of LNM is variable and depends on a number of external factors. Deep learning has shown success in computational pathology, but has struggled to boost performance when combined with known predictors. Methods: Machine-learned features are created by clustering deep learning embeddings of small patches of tumor in colorectal cancer via k-means, and then selecting the top clusters that add predictive value to a logistic regression model when combined with known baseline clinicopathological variables. We then analyze performance of logistic regression models trained with and without these machine-learned features in combination with the baseline variables. Results: The machine-learned extracted features provide independent signal for the presence of LNM (AUROC: 0.638, 95% CI: [0.590, 0.683]). Furthermore, the machine-learned features add predictive value to the set of 6 clinicopathologic variables in an external validation set (likelihood ratio test, p < 0.00032; AUROC: 0.740, 95% CI: [0.701, 0.780]). A model incorporating these features can also further risk-stratify patients with and without identified metastasis (p < 0.001 for both stage II and stage III). Conclusion: This work demonstrates an effective approach to combine deep learning with established clinicopathologic factors in order to identify independently informative features associated with LNM. Further work building on these specific results may have important impact in prognostication and therapeutic decision making for LNM. Additionally, this general computational approach may prove useful in other contexts. View details
    Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge
    Wouter Bulten
    Kimmo Kartasalo
    Po-Hsuan Cameron Chen
    Peter Ström
    Hans Pinckaers
    Kunal Nagpal
    Yuannan Cai
    Hester van Boven
    Robert Vink
    Christina Hulsbergen-van de Kaa
    Jeroen van der Laak
    Mahul B. Amin
    Andrew J. Evans
    Theodorus van der Kwast
    Robert Allan
    Peter A. Humphrey
    Henrik Grönberg
    Hemamali Samaratunga
    Brett Delahunt
    Toyonori Tsuzuki
    Tomi Häkkinen
    Lars Egevad
    Maggie Demkin
    Sohier Dane
    Fraser Tan
    Masi Valkonen
    Lily Peng
    Craig H. Mermel
    Pekka Ruusuvuori
    Geert Litjens
    Martin Eklund
    the PANDA challenge consortium
    Nature Medicine, vol. 28 (2022), pp. 154-163
    Preview abstract Artificial intelligence (AI) has shown promise for diagnosing prostate cancer in biopsies. However, results have been limited to individual studies, lacking validation in multinational settings. Competitions have been shown to be accelerators for medical imaging innovations, but their impact is hindered by lack of reproducibility and independent validation. With this in mind, we organized the PANDA challenge—the largest histopathology competition to date, joined by 1,290 developers—to catalyze development of reproducible AI algorithms for Gleason grading using 10,616 digitized prostate biopsies. We validated that a diverse set of submitted algorithms reached pathologist-level performance on independent cross-continental cohorts, fully blinded to the algorithm developers. On United States and European external validation sets, the algorithms achieved agreements of 0.862 (quadratically weighted κ, 95% confidence interval (CI), 0.840–0.884) and 0.868 (95% CI, 0.835–0.900) with expert uropathologists. Successful generalization across different patient populations, laboratories and reference standards, achieved by a variety of algorithmic approaches, warrants evaluating AI-based Gleason grading in prospective clinical trials. View details
    Deep learning models for histologic grading of breast cancer and association with disease prognosis
    Trissia Brown
    Isabelle Flament
    Fraser Tan
    Yuannan Cai
    Kunal Nagpal
    Emad Rakha
    David J. Dabbs
    Niels Olson
    James H. Wren
    Elaine E. Thompson
    Erik Seetao
    Carrie Robinson
    Melissa Miao
    Fabien Beckers
    Lily Hao Yi Peng
    Craig Mermel
    Cameron Chen
    npj Breast Cancer (2022)
    Preview abstract Histologic grading of breast cancer involves review and scoring of three well-established morphologic features: mitotic count, nuclear pleomorphism, and tubule formation. Taken together, these features form the basis of the Nottingham Grading System which is used to inform breast cancer characterization and prognosis. In this study, we developed deep learning models to perform histologic scoring of all three components using digitized hematoxylin and eosin-stained slides containing invasive breast carcinoma. We then evaluated the prognostic potential of these models using an external test set and progression free interval as the primary outcome. The individual component models performed at or above published benchmarks for algorithm-based grading approaches and achieved high concordance rates in comparison to pathologist grading. Prognostic performance of histologic scoring provided by the deep learning-based grading was on par with that of pathologists performing review of matched slides. Additionally, by providing scores for each component feature, the deep-learning based approach provided the potential to identify the grading components contributing most to prognostic value. This may enable optimized prognostic models as well as opportunities to improve access to consistent grading and better understand the links between histologic features and clinical outcomes in breast cancer. View details
    Onboarding Materials as Cross-functional Boundary Objects for Developing AI Assistants
    Lauren Wilcox
    Samantha Winter
    Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems, ACM (2021) (to appear)
    Preview abstract Deep neural networks (DNNs) routinely achieve state-of-the-art performance in a wide range of tasks. This case study reports on the development of onboarding (i.e., training) materials for a DNN-based medical AI Assistant to aid in the grading of prostate cancer. Specifically, we describe how the process of developing these materials deepened the team's understanding of end-user requirements, leading to changes in the development and assessment of the underlying machine learning model. In this sense, the onboarding materials served as a useful boundary object for a cross-functional team. We also present evidence of the utility of the subsequent onboarding materials by describing which information was found useful by participants in an experimental study. View details
    Comparative analysis of machine learning approaches to classify tumor mutation burden in lung adenocarcinoma using histopathology images
    Apaar Sadhwani
    Huang-Wei Chang
    Ali Behrooz
    Trissia Brown
    Isabelle Flament
    Hardik Patel
    Robert Findlater
    Vanessa Velez
    Fraser Tan
    Kamilla Marta Tekiela
    Eunhee Yi
    Craig Mermel
    Debra Hanks
    Cameron Chen
    Kimary Kulig
    Cory Batenchuk
    Peter Cimermancic
    Scientific Reports (2021)
    Preview abstract Both histologic subtype and tumor mutation burden (TMB) represent important biomarkers in lung cancer, with implications for patient prognosis as well as treatment decisions. Typically, TMB is evaluated by comprehensive genomic profiling but this requires use of finite tissue specimens as well as costly and time consuming laboratory processes. Histologic subtype classification represents an established component of lung adenocarcinoma histopathology, but it can be a challenging task with substantial inter-pathologist variability. Here we developed a deep learning system to both classify histologic patterns in lung adenocarcinoma and predict TMB status using Hematoxylin and Eosin (H&E) stained whole slide images. We first trained a convolutional neural network to comprehensively infer histologic subtypes across whole slide images of lung cancer resection specimens. This model achieved a patch-level area under the receiver operating characteristic curve (AUROC) of 0.78-0.98 for the individual features on a test including TCGA slides and 50 external dataset slides. We then integrated the output of this model with clinico-demographic data to develop an interpretable model for TMB classification and evaluated the end-to-end system on 172 held out cases from TCGA, achieving an AUROC of 0.71 [95%CI 0.62-0.79]. Finally we also developed a weakly supervised model for TMB classification, finding that our histologic subtype-based approach achieves similar performance (AUROC of 0.72 95% CI XXX) to the weakly supervised approach. These results suggest interpretable approaches for molecular biomarker prediction based on established histologic patterns are feasible and comparable to more difficult to explain deep learning approaches. View details
    Determining Breast Cancer Biomarker Status and Associated Morphological Features Using Deep Learning
    Paul Gamble
    Harry Wang
    Fraser Tan
    Melissa Moran
    Trissia Brown
    Isabelle Flament
    Emad A. Rakha
    Michael Toss
    David J. Dabbs
    Peter Regitnig
    Niels Olson
    James H. Wren
    Carrie Robinson
    Lily Peng
    Craig Mermel
    Cameron Chen
    Nature Communications Medicine (2021)
    Preview abstract Background: Breast cancer management depends on biomarkers including estrogen receptor, progesterone receptor, and human epidermal growth factor receptor 2 (ER/PR/HER2). Though existing scoring systems are widely used and well-validated, they can involve costly preparation and variable interpretation. Additionally, discordances between histology and expected biomarker findings can prompt repeat testing to address biological, interpretative, or technical reasons for unexpected results. Methods: We developed three independent deep learning systems (DLS) to directly predict ER/PR/HER2 status for both focal tissue regions (patches) and slides using hematoxylin-andeosin-stained (H&E) images as input. Models were trained and evaluated using pathologist annotated slides from three data sources. Areas under the receiver operator characteristic curve (AUCs) were calculated for test sets at both a patch-level (>135 million patches, 181 slides) and slide-level (n = 3274 slides, 1249 cases, 37 sites). Interpretability analyses were performed using Testing with Concept Activation Vectors (TCAV), saliency analysis, and pathologist review of clustered patches. Results: The patch-level AUCs are 0.939 (95%CI 0.936–0.941), 0.938 (0.936–0.940), and 0.808 (0.802–0.813) for ER/PR/HER2, respectively. At the slide level, AUCs are 0.86 (95% CI 0.84–0.87), 0.75 (0.73–0.77), and 0.60 (0.56–0.64) for ER/PR/HER2, respectively. Interpretability analyses show known biomarker-histomorphology associations including associations of low-grade and lobular histology with ER/PR positivity, and increased inflammatory infiltrates with triple-negative staining. Conclusions: This study presents rapid breast cancer biomarker estimation from routine H&E slides and builds on prior advances by prioritizing interpretability of computationally learned features in the context of existing pathological knowledge. View details
    A.I.-based Gleason Grading for Stratification of Prostate Cancer Outcomes
    Kunal Nagpal
    Matthew Symonds
    Melissa Moran
    Markus Plass
    Robert Reihs
    Farah Nader
    Fraser Tan
    Yuannan Cai
    Trissia Brown
    Isabelle Flament
    Mahul Amin
    Martin Stumpe
    Heimo Muller
    Peter Regitnig
    Andreas Holzinger
    Lily Hao Yi Peng
    Cameron Chen
    Kurt Zatloukal
    Craig Mermel
    Communications Medicine (2021)
    Preview abstract Background. Gleason grading of prostate cancer is an important prognostic factor, but suffers from poor reproducibility, particularly among non-subspecialist pathologists. Although artificial intelligence (A.I.) tools have demonstrated Gleason grading on-par with expert pathologists, it remains an open question whether and to what extent A.I. grading translates to better prognostication. Methods. In this study, we developed a system to predict prostate cancer-specific mortality via A.I.-based Gleason grading and subsequently evaluated its ability to risk-stratify patients on an independent retrospective cohort of 2807 prostatectomy cases from a single European center with 5–25 years of follow-up (median: 13, interquartile range 9–17). Results. Here, we show that the A.I.’s risk scores produced a C-index of 0.84 (95% CI 0.80–0.87) for prostate cancer-specific mortality. Upon discretizing these risk scores into risk groups analogous to pathologist Grade Groups (GG), the A.I. has a C-index of 0.82 (95% CI 0.78–0.85). On the subset of cases with a GG provided in the original pathology report (n = 1517), the A.I.’s C-indices are 0.87 and 0.85 for continuous and discrete grading, respectively, compared to 0.79 (95% CI 0.71–0.86) for GG obtained from the reports. These represent improvements of 0.08 (95% CI 0.01–0.15) and 0.07 (95% CI 0.00–0.14), respectively. Conclusions. Our results suggest that A.I.-based Gleason grading can lead to effective risk stratification, and warrants further evaluation for improving disease management. View details
    Interpretable Survival Prediction for Colorectal Cancer using Deep Learning
    Melissa Moran
    Markus Plass
    Robert Reihs
    Fraser Tan
    Isabelle Flament
    Trissia Brown
    Peter Regitnig
    Cameron Chen
    Apaar Sadhwani
    Bob MacDonald
    Benny Ayalew
    Lily Hao Yi Peng
    Heimo Mueller
    Zhaoyang Xu
    Martin Stumpe
    Kurt Zatloukal
    Craig Mermel
    npj Digital Medicine (2021)
    Preview abstract Deriving interpretable prognostic features from deep-learning-based prognostic histopathology models remains a challenge. In this study, we developed a deep learning system (DLS) for predicting disease-specific survival for stage II and III colorectal cancer using 3652 cases (27,300 slides). When evaluated on two validation datasets containing 1239 cases (9340 slides) and 738 cases (7140 slides), respectively, the DLS achieved a 5-year disease-specific survival AUC of 0.70 (95% CI: 0.66–0.73) and 0.69 (95% CI: 0.64–0.72), and added significant predictive value to a set of nine clinicopathologic features. To interpret the DLS, we explored the ability of different human-interpretable features to explain the variance in DLS scores. We observed that clinicopathologic features such as T-category, N-category, and grade explained a small fraction of the variance in DLS scores (R2 = 18% in both validation sets). Next, we generated human-interpretable histologic features by clustering embeddings from a deep-learning-based image-similarity model and showed that they explained the majority of the variance (R2 of 73–80%). Furthermore, the clustering-derived feature most strongly associated with high DLS scores was also highly prognostic in isolation. With a distinct visual appearance (poorly differentiated tumor cell clusters adjacent to adipose tissue), this feature was identified by annotators with 87.0–95.5% accuracy. Our approach can be used to explain predictions from a prognostic deep learning model and uncover potentially-novel prognostic features that can be reliably identified by people for future validation studies. View details
    Closing the translation gap: AI applications in digital pathology
    Cameron Chen
    Craig Mermel
    Biochimica et Biophysica Acta (BBA) - Reviews on Cancer (2020)
    Preview abstract Recent advances in artificial intelligence show tremendous promise to improve the accuracy, reproducibility, and availability of medical diagnostics across a number of medical subspecialities. This is especially true in the field of digital pathology, which has recently witnessed a surge in publications describing state-of-the-art performance for machine learning models across a wide range of diagnostic applications. Nonetheless, despite this promise, there remain significant gaps in translating applications for any of these technologies into actual clinical practice. In this review, we will first give a brief overview of the recent progress in applying AI to digitized pathology images, focusing on how these tools might be applied in clinical workflows in the near term to improve the accuracy and efficiency of pathologists. Then we define and describe in detail the various factors that need to be addressed in order to successfully close the "translation gap" for AI applications in digital pathology. View details
    Deep learning-based survival prediction for multiple cancer types using histopathology images
    Zhaoyang Xu
    Apaar Sadhwani
    Hongwu Wang
    Isabelle Flament
    Craig Mermel
    Cameron Chen
    Martin Stumpe
    PLOS ONE (2020)
    Preview abstract Providing prognostic information at the time of cancer diagnosis has important implications for treatment and monitoring. Although cancer staging, histopathological assessment, molecular features, and clinical variables can provide useful prognostic insights, improving risk stratification remains an active research area. We developed a deep learning system (DLS) to predict disease specific survival across 10 cancer types from The Cancer Genome Atlas (TCGA). We used a weakly-supervised approach without pixel-level annotations, and tested three different survival loss functions. The DLS was developed using 9,086 slides from 3,664 cases and evaluated using 3,009 slides from 1,216 cases. In multivariable Cox regression analysis of the combined cohort including all 10 cancers, the DLS was significantly associated with disease specific survival (hazard ratio of 1.58, 95% CI 1.28–1.70, p<0.0001) after adjusting for cancer type, stage, age, and sex. In a per-cancer adjusted subanalysis, the DLS remained a significant predictor of survival in 5 of 10 cancer types. Compared to a baseline model including stage, age, and sex, the c-index of the model demonstrated an absolute 3.7% improvement (95% CI 1.0–6.5) in the combined cohort. Additionally, our models stratified patients within individual cancer stages, particularly stage II (p = 0.025) and stage III (p<0.001). By developing and evaluating prognostic models across multiple cancer types, this work represents one of the most comprehensive studies exploring the direct prediction of clinical outcomes using deep learning and histopathology images. Our analysis demonstrates the potential for this approach to provide significant prognostic information in multiple cancer types, and even within specific pathologic stages. However, given the relatively small number of cases and observed clinical events for a deep learning task of this type, we observed wide confidence intervals for model performance, thus highlighting that future work will benefit from larger datasets assembled for the purposes for survival modeling. View details
    Evaluation of the Use of Combined Artificial Intelligence and Pathologist Assessment to Review and Grade Prostate Biopsies
    Kunal Nagpal
    Davis J. Foote
    Adam Pearce
    Samantha Winter
    Matthew Symonds
    Liron Yatziv
    Trissia Brown
    Isabelle Flament-Auvigne
    Fraser Tan
    Martin C. Stumpe
    Cameron Chen
    Craig Mermel
    JAMA Network Open (2020)
    Preview abstract Importance: Expert-level artificial intelligence (AI) algorithms for prostate biopsy grading have recently been developed. However, the potential impact of integrating such algorithms into pathologist workflows remains largely unexplored. Objective: To evaluate an expert-level AI-based assistive tool when used by pathologists for the grading of prostate biopsies. Design, Setting, and Participants: This diagnostic study used a fully crossed multiple-reader, multiple-case design to evaluate an AI-based assistive tool for prostate biopsy grading. Retrospective grading of prostate core needle biopsies from 2 independent medical laboratories in the US was performed between October 2019 and January 2020. A total of 20 general pathologists reviewed 240 prostate core needle biopsies from 240 patients. Each pathologist was randomized to 1 of 2 study cohorts. The 2 cohorts reviewed every case in the opposite modality (with AI assistance vs without AI assistance) to each other, with the modality switching after every 10 cases. After a minimum 4-week washout period for each batch, the pathologists reviewed the cases for a second time using the opposite modality. The pathologist-provided grade group for each biopsy was compared with the majority opinion of urologic pathology subspecialists. Exposure: An AI-based assistive tool for Gleason grading of prostate biopsies. Main Outcomes and Measures: Agreement between pathologists and subspecialists with and without the use of an AI-based assistive tool for the grading of all prostate biopsies and Gleason grade group 1 biopsies. Results: Biopsies from 240 patients (median age, 67 years; range, 39-91 years) with a median prostate-specific antigen level of 6.5 ng/mL (range, 0.6-97.0 ng/mL) were included in the analyses. Artificial intelligence–assisted review by pathologists was associated with a 5.6% increase (95% CI, 3.2%-7.9%; P < .001) in agreement with subspecialists (from 69.7% for unassisted reviews to 75.3% for assisted reviews) across all biopsies and a 6.2% increase (95% CI, 2.7%-9.8%; P = .001) in agreement with subspecialists (from 72.3% for unassisted reviews to 78.5% for assisted reviews) for grade group 1 biopsies. A secondary analysis indicated that AI assistance was also associated with improvements in tumor detection, mean review time, mean self-reported confidence, and interpathologist agreement. Conclusions and Relevance: In this study, the use of an AI-based assistive tool for the review of prostate biopsies was associated with improvements in the quality, efficiency, and consistency of cancer detection and grading. View details
    Development and Validation of a Deep Learning Algorithm for Gleason Grading of Prostate Cancer From Biopsy Specimens
    Kunal Nagpal
    Davis Foote
    Fraser Tan
    Cameron Chen
    Naren Manoj
    Niels Olson
    Jenny Smith
    Arash Mohtashamian
    Brandon Peterson
    Mahul Amin
    Andrew Evans
    Joan Sweet
    Carol Cheung
    Theodorus van der Kwast
    Ankur Sangoi
    Ming Zhou
    Robert W. Allan
    Peter A Humphrey
    Jason Hipp
    Krishna Kumar Gadepalli
    Lily Hao Yi Peng
    Martin Stumpe
    Craig Mermel
    JAMA Oncology (2020)
    Preview abstract Importance: For prostate cancer, Gleason grading of the biopsy specimen plays a pivotal role in determining case management. However, Gleason grading is associated with substantial interobserver variability, resulting in a need for decision support tools to improve the reproducibility of Gleason grading in routine clinical practice. Objective: To evaluate the ability of a deep learning system (DLS) to grade diagnostic prostate biopsy specimens. Design, Setting, and Participants: The DLS was evaluated using 752 deidentified digitized images of formalin-fixed paraffin-embedded prostate needle core biopsy specimens obtained from 3 institutions in the United States, including 1 institution not used for DLS development. To obtain the Gleason grade group (GG), each specimen was first reviewed by 2 expert urologic subspecialists from a multi-institutional panel of 6 individuals (years of experience: mean, 25 years; range, 18-34 years). A third subspecialist reviewed discordant cases to arrive at a majority opinion. To reduce diagnostic uncertainty, all subspecialists had access to an immunohistochemical-stained section and 3 histologic sections for every biopsied specimen. Their review was conducted from December 2018 to June 2019. Main Outcomes and Measures: The frequency of the exact agreement of the DLS with the majority opinion of the subspecialists in categorizing each tumor-containing specimen as 1 of 5 categories: nontumor, GG1, GG2, GG3, or GG4-5. For comparison, the rate of agreement of 19 general pathologists’ opinions with the subspecialists’ majority opinions was also evaluated. Results: For grading tumor-containing biopsy specimens in the validation set (n = 498), the rate of agreement with subspecialists was significantly higher for the DLS (71.7%; 95% CI, 67.9%-75.3%) than for general pathologists (58.0%; 95% CI, 54.5%-61.4%) (P < .001). In subanalyses of biopsy specimens from an external validation set (n = 322), the Gleason grading performance of the DLS remained similar. For distinguishing nontumor from tumor-containing biopsy specimens (n = 752), the rate of agreement with subspecialists was 94.3% (95% CI, 92.4%-95.9%) for the DLS and similar at 94.7% (95% CI, 92.8%-96.3%) for general pathologists (P = .58). Conclusions and Relevance: In this study, the DLS showed higher proficiency than general pathologists at Gleason grading prostate needle core biopsy specimens and generalized to an independent institution. Future research is necessary to evaluate the potential utility of using the DLS as a decision support tool in clinical workflows and to improve the quality of prostate cancer grading for therapy decisions. View details
    Pathology Outlines: Computer-Aided Diagnosis
    Kunal Nagpal
    Cameron Chen
    Craig Mermel
    Pathology Outlines (2020)
    Preview abstract A computer aided diagnosis (CADx) tool in pathology is a system meant to assist with interpreting histologic or cytologic findings of interest. View details
    "Hello AI": Uncovering the Onboarding Needs of Medical Practitioners for Human-AI Collaborative Decision-Making
    Samantha Winter
    Lauren Wilcox
    Proc. ACM Hum.-Comput. Interact., Association for Computing Machinery, ACM CSCW, New York, NY, USA (2019), pp. 24 (to appear)
    Preview abstract Although rapid advances in machine learning have made it increasingly applicable to expert decision-making, the delivery of accurate algorithmic predictions alone is insufficient for effective human–AI collaboration. In this work, we investigate the key types of information medical experts desire when they are first introduced to a diagnostic AI assistant. In a qualitative lab study, we interviewed 21 pathologists before, during, and after being presented deep neural network (DNN) predictions for prostate cancer diagnosis, to learn the types of information that they desired about the AI assistant. Our findings reveal that, far beyond understanding the local, case-specific reasoning behind any model decision, clinicians desired upfront information about basic, global properties of the model, such as its known strengths and limitations, its subjective point-of-view, and its overall design objective—what it’s designed to be optimized for. Participants compared these information needs to the collaborative mental models they develop of their medical colleagues when seeking a second opinion: the medical perspectives and standards that those colleagues embody, and the compatibility of those perspectives with their own diagnostic patterns. These findings broaden and enrich discussions surrounding AI transparency for collaborative decision-making, providing a richer understanding of what experts find important in their introduction to AI assistants before integrating them into routine practice. View details
    Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation
    Anna Dagna Majkowska
    Sid Mittal
    Joshua Reicher
    Scott Mayer McKinney
    Gavin Duggan
    Cameron Chen
    Sreenivasa Raju Kalidindi
    Alexander Ding
    Shravya Ramesh Shetty
    Radiology (2019)
    Preview abstract Background Deep learning has the potential to augment the use of chest radiography in clinical radiology, but challenges include poor generalizability, spectrum bias, and difficulty comparing across studies. Purpose To develop and evaluate deep learning models for chest radiograph interpretation by using radiologist-adjudicated reference standards. Materials and Methods Deep learning models were developed to detect four findings (pneumothorax, opacity, nodule or mass, and fracture) on frontal chest radiographs. This retrospective study used two data sets. Data set 1 (DS1) consisted of 759 611 images from a multicity hospital network and ChestX-ray14 is a publicly available data set with 112 120 images. Natural language processing and expert review of a subset of images provided labels for 657 954 training images. Test sets consisted of 1818 and 1962 images from DS1 and ChestX-ray14, respectively. Reference standards were defined by radiologist-adjudicated image review. Performance was evaluated by area under the receiver operating characteristic curve analysis, sensitivity, specificity, and positive predictive value. Four radiologists reviewed test set images for performance comparison. Inverse probability weighting was applied to DS1 to account for positive radiograph enrichment and estimate population-level performance. Results In DS1, population-adjusted areas under the receiver operating characteristic curve for pneumothorax, nodule or mass, airspace opacity, and fracture were, respectively, 0.95 (95% confidence interval [CI]: 0.91, 0.99), 0.72 (95% CI: 0.66, 0.77), 0.91 (95% CI: 0.88, 0.93), and 0.86 (95% CI: 0.79, 0.92). With ChestX-ray14, areas under the receiver operating characteristic curve were 0.94 (95% CI: 0.93, 0.96), 0.91 (95% CI: 0.89, 0.93), 0.94 (95% CI: 0.93, 0.95), and 0.81 (95% CI: 0.75, 0.86), respectively. Conclusion Expert-level models for detecting clinically relevant chest radiograph findings were developed for this study by using adjudicated reference standards and with population-level performance estimation. Radiologist-adjudicated labels for 2412 ChestX-ray14 validation set images and 1962 test set images are provided. View details
    Preview abstract A computer-aided detection (CADe) tool in pathology is a system to assist with locating histologic or cytologic findings of interest View details
    Impact of Deep Learning Assistance on the Histopathologic Review of Lymph Nodes for Metastatic Breast Cancer
    Bob MacDonald
    Peter Truszkowski
    Jason Hipp
    Christopher Lee Gammage
    Florence Thng
    Lily Peng
    Martin Stumpe
    American Journal of Surgical Pathology (2018)
    Preview abstract Advances in the quality of whole-slide images have set the stage for the clinical use of digital images in anatomic pathology. Along with advances in computer image analysis, this raises the possibility for computer-assisted diagnostics in pathology to improve histopathologic interpretation and clinical care. To evaluate the potential impact of digital assistance on interpretation of digitized slides, we conducted a multireader multicase study utilizing our deep learning algorithm for the detection of breast cancer metastasis in lymph nodes. Six pathologists reviewed 70 digitized slides from lymph node sections in 2 reader modes, unassisted and assisted, with a wash-out period between sessions. In the assisted mode, the deep learning algorithm was used to identify and outline regions with high likelihood of containing tumor. Algorithm-assisted pathologists demonstrated higher accuracy than either the algorithm or the pathologist alone. In particular, algorithm assistance significantly increased the sensitivity of detection for micrometastases (91% vs. 83%, P=0.02). In addition, average review time per image was significantly shorter with assistance than without assistance for both micrometastases (61 vs. 116 s, P=0.002) and negative images (111 vs. 137 s, P=0.018). Lastly, pathologists were asked to provide a numeric score regarding the difficulty of each image classification. On the basis of this score, pathologists considered the image review of micrometastases to be significantly easier when interpreted with assistance (P=0.0005). Utilizing a proof of concept assistant tool, this study demonstrates the potential of a deep learning algorithm to improve pathologist accuracy and efficiency in a digital pathology workflow. View details
    No Results Found