Jump to Content

Annotator Response Distributions as a Sampling Frame

Christopher Homan
Lora Mois Aroyo
LREC WOrkshop on Perspectivist NLP (2022)
Google Scholar


Annotator disagreement is often dismissed as noise or the result of poor annotation process quality. Others have argued that it can be meaningful. But lacking a rigorous statistical foundation, the analysis of disagreement patterns can resemble a high-tech form of tea-leaf-reading. We contribute a framework for analyzing the variation of per-item annotator response distributions to data for humans-in-the-loop machine learning. We provide visualizations for, and use the framework to analyze the variance in, a crowdsourced dataset of hard-to-classify examples of the OpenImages archive.

Research Areas