Art Sheets for Art Datasets

Ramya Malur Srinivasan
Jordan Jennifer Famularo
Beth Coleman
NeurIPS Dataset & Benchmark track (2021)

Abstract

As machine learning (ML) techniques are being employed to authenticate artworks and estimate their market value, computational tasks have expanded across a variety of creative domains and datasets drawn from the arts. With recent progress in generative modeling, ML techniques are also used for simulating artistic styles and for producing new content in various media such as music, visual arts, poetry, etc. While this progress has opened up new creative avenues, it has also paved the way for adverse downstream effects such as cultural appropriation (e.g., cultural misrepresentation, offense, and undervaluing) and amplification of gender and racial stereotypes, to name a few. Many such concerning issues stem from the training data in ways that diligent evaluation can uncover, prevent, and mitigate. In this paper, we provide a checklist of questions customized for use with art datasets, building on the questionnaire for datasets provided in Datasheets, by guiding assessment of developer motivation together with dataset provenance, composition, collection, pre-processing, cleaning, labeling, use (including data generation/synthesis), distribution, and maintenance. Case studies exemplify the value of our questionnaire. We hope our work aids ML scientists and developers by providing a framework for responsible design, development, and use of art datasets.

Research Areas