Google Research

Privacy-first Health Research with Federated Learning

medrxiv, vol. (2021)


Privacy protection is paramount in designing and running health studies. However, most health research to date uses data stored in a centralized database, where analysis and model fitting is done with full access to the sensitive underlying data. Recent advances in federated learning enable building complex machine-learned models that are trained in a purely distributed fashion such that private data never leaves the owner’s device but can still contribute to improvement of a global model. Here we show federated models achieve the same level of accuracy, predictive power, generalizability to previously unseen data, and result in the same interpretation as models built in a classical way. We demonstrate this in a side-by-side fashion on a spectrum of open-data health studies ranging from diabetes to heart disease. This work is the first to apply modern federated learning methods to clinical studies and demonstrates how privacy-first studies involving multi-modal data can be run. We make all code used for this research available as open source and all data sets used are publicly available.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work