Iterative quality control strategies for expert medical image labeling

Sonia Phene
Abigail Huang
Rebecca Ackermann
Olga Kanzheleva
Caitlin Taggart
Proceedings of the AAAI Conference on Human Computation and Crowdsourcing (2021)

Abstract

Data quality is a key concern for artificial intelligence (AI) efforts that rely upon crowdsourced data collection. In the domain of medicine in particular, labeled data must meet higher quality standards, or the resulting AI may lead to patient harm, and/or perpetuate biases. What are the challenges involved in expert medical labeling? What processes do such teams employ? In this study, we interviewed members of teams developing AI for medical imaging across 4 subdomains (ophthalmology, radiology, pathology, and dermatology). We identify a set of common practices for ensuring data quality. We describe one instance of low-quality labeling caught by post-launch monitoring. However, the more common pattern is to involve experts in an iterative process of defining, testing, and iterating tasks and instructions. Teams invest in these upstream efforts in order to mitigate downstream quality issues during large-scale labeling.