When is ML data good?: Valuing in Public Health Datafication

Azra Ismail
Pratyush Kumar
Alex Hanna
Nithya Sambasivan
Neha Kumar
CHI 2022 (2022) (to appear)

Abstract

Data-driven approaches that form the foundation of advancements in artificial intelligence (AI) and machine learning (ML) are powered in large part by human infrastructures that enable the collection of large datasets. We examine the movement of data through multiple stages of data collection in the context of public health in India, where the data workers include frontline health workers, data stewards, and AI/ML developers. We conducted interviews with these stakeholders to understand how they value data differently at each stage, how data are worked upon to attain this value, as well as the challenges that arise in the process. Our work uncovers the tensions in valuing across stakeholders, and lays out implications for work of ML datasets. We discuss how these tensions arise and how they might be addressed, and the need for better transparency and accountability as data is transformed from one stage to the next.