When is ML data good?: Valuing in Public Health Datafication
Abstract
Data-driven approaches that form the foundation of advancements in artificial intelligence (AI) and machine learning (ML) are powered in large part by human infrastructures that enable the collection of large datasets. We examine the movement of data through multiple stages of data collection in the context of public health in India, where the data workers include frontline health workers, data stewards, and AI/ML developers. We conducted interviews with these stakeholders to understand how they value data differently at each stage, how data are worked upon to attain this value, as well as the challenges that arise in the process. Our work uncovers the tensions in valuing across stakeholders, and lays out implications for work of ML datasets. We discuss how these tensions arise and how they might be addressed, and the need for better transparency and accountability as data is transformed from one stage to the next.