Does Your Dermatology Classifier Know What It Doesn't Know? Detecting the Long-Tail of Unseen Conditions

Abhijit Guha Roy

Jie Jessie Ren

Shekoofeh Azizi

Aaron Loh

Vivek Natarajan

Basil Mustafa

Nick Pawlowski

Jan Freyberg

Yuan Liu

Zach William Beaver

Nam Vo

Peggy Bui

Samantha Winter

Patricia MacWilliams

Greg Corrado

Umesh Telang

Yun Liu

Taylan Cemgil

Alan Karthikesalingam

Balaji Lakshminarayanan

Jim Winkens

Medical Imaging Analysis (2021)

Google Scholar

Abstract

Supervised deep learning models have proven to be highly effective in classification of dermatological conditions. These models rely on the availability of abundant labeled training examples. However, in the real world, many dermatological conditions are individually too infrequent for per-condition classification with supervised learning. Although individually infrequent, these conditions may collectively be common and therefore are clinically significant in aggregate. To avoid models generating erroneous outputs on such examples, there remains a considerable unmet need for deep learning systems that can better detect such infrequent conditions. These infrequent `outlier' conditions are seen very rarely (or not at all) during training. In this paper, we frame this task as an out-of-distribution (OOD) detection problem. We set up a benchmark ensuring that outlier conditions are disjoint between model train, validation, and test sets. Unlike most traditional OOD benchmarks which detect dataset distribution shift, we aim at detecting semantic differences, often referred to as near-OOD detection which is a more difficult task. We propose a novel hierarchical outlier detection (HOD) approach, which assigns multiple abstention classes for each training outlier class and jointly performs a coarse classification of inliers \vs{} outliers, along with fine-grained classification of the individual classes. We demonstrate that the proposed HOD outperforms existing techniques for outlier exposure based OOD detection. We also use different state-of-the-art representation learning approaches (BiT-JFT, SimCLR, MICLe) to improve OOD performance and demonstrate the effectiveness of HOD loss for them.
Further, we explore different ensembling strategies for OOD detection and propose a diverse ensemble selection process for the best result. We also performed a subgroup analysis over conditions of varying risk levels and different skin types to investigate how OOD performance changes over each subgroup and demonstrated the gains of our framework in comparison to baselines. Furthermore, we go beyond traditional performance metrics and introduce a cost metric to approximate downstream clinical impact. We used this cost metric to compare the proposed method against the baseline, thereby making a stronger case for its effectiveness in real-world deployment scenarios.

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Does Your Dermatology Classifier Know What It Doesn't Know? Detecting the Long-Tail of Unseen Conditions

Abstract

Research Areas

Meet the teams driving innovation