Documenting Computer Vision Datasets: An Invitation to Reflexive Data Practices

Maria de los Milagros Miceli
Tianling Yang
Laurens Naudts
Martin Schuessler
Diana Serbanescu
Alex Hanna
ACM Conference on Fairness, Accountability, and Transparency(2021)

Abstract

In industrial computer vision, discretionary decisions surrounding the production of image training data remain widely undocumented. Recent research taking issue with such opacity has proposed standardized processes for dataset documentation. In this paper, we expand this space of inquiry through fieldwork at two data processing companies and thirty-four interviews with data workers and a computer vision practitioner. We identify four key issues that hinder the documentation of image datasets and the effective retrieval of production contexts. We argue that reflexivity, understood as a collective consideration of social and intellectual factors that lead to praxis, is a necessary precondition for documentation. Reflexive documentation can help to expose the contexts, relations, routines, and power structures that shape data.