Data Excellence: Better Data for Better AI

Lora Mois Aroyo

Data Excellence: Better Data for Better AI

Lora Mois Aroyo

(2020)

Google Scholar

Abstract

Human annotated data plays a crucial role in the current ML/AI climate, where the human judgements are referenced as the ultimate source of truth. As such, human annotated data is a kind of compass for AI and research on Human Computation has a multiplicative effect on the AI field. Optimizing the cost, scale, and speed of data collection has been the center of the Human Computation research. What is less known is that such optimization is sometimes done at the cost of quality [Riezler, 2014]. Quality is evidently important but unfortunately poorly defined and rarely measured. A decade later, problems inherent to data are beginning to surface: fairness and bias [Goel and Faltings, 2019], quality issues [Crawford and Paglen, 2019], limitations of benchmarks [Kovaleva et al., 2019, Welty et al., 2019] reproducibility in ML research [Pineau et al., 2018, Gunderson and Kjensmo, 2018], lack of documentation [Katsuno et al., 2019]. Finally, in rushing to be first to market, aspects of data quality such as maintainability, reliability, validity, and fidelity are often overlooked. We want to turn this way of thinking on its head and highlight examples, case-studies, and methodologies for excellence in data collection. Currently, to the extent that it does, data excellence happens organically by virtue of individual expertise, diligence, commitment, pride, etc. This could be dangerous as we grow increasingly dependent on automation technologies, we don't want to be at the mercy of individual exemplars. Instead, we should codify data excellence in a systematic manner and raise the standards on the entire industry.

Research Areas

Human-computer interaction and visualization

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Data Excellence: Better Data for Better AI

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs