Google Research

Development of a Transparency Artifact for Health Datasets



Machine learning (ML) approaches have shown promising results in a variety of healthcare applications. Data plays a vital role in the development of ML-based healthcare systems that directly impact human lives. Many of the ethical issues with healthcare applications of ML can be traced back to structural inequalities that are reflected in the way we collect and process data. Developing a guideline for improving documentation practices in the creation, use and maintenance of ML healthcare datasets is of critical importance. In this work, we introduce Healthsheet, to address adaptations and expansions of the original Datasheet questionnaire to healthcare-specific applications. We address the collection and use of sensitive attributes, dataset versioning and maintenance, privacy, data collection context, and health-related devices. As part of the development process of Healthsheet, we worked with three publicly-available healthcare datasets as our case studies, each with different types of structured data: Electronic Health Records (EHR), multiple sclerosis (MS) clinical trial data and smartphone-based performance outcome measures.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work