Jump to Content

Expert Discussions Improve Comprehension of Difficult Cases in Medical Image Assessment

Mike Schaekermann
Abigail E. Huang
Rory Sayres
ACM CHI Conference on Human Factors in Computing Systems (CHI 2020) (2020) (to appear)
Google Scholar


Medical data labeling workflows critically depend on accurate assessments from human experts. Yet human assessments can vary markedly, even among medical experts. Prior research has demonstrated benefits of labeler training on performance. Here we utilized two types of labeler training feedback: highlighting incorrect labels for difficult cases ("individual performance" feedback), and expert discussions from adjudication of these cases. We presented ten non-specialist eye care professionals with either individual performance alone, or individual performance and expert discussions. Compared to performance feedback alone, seeing expert discussions significantly improved non-specialists' understanding of the rationale behind the correct diagnosis while motivating changes in their own labeling approach; and also significantly improved average accuracy on one of four pathologies in a held-out test set. This work suggests that image adjudication may provide benefits beyond developing trusted consensus labels, and that exposure to specialist discussions can be an effective training intervention for medical diagnosis.