Google Research

Privacy-first Health Research with Federated Learning

medrxiv, vol. (2021)


Privacy protection is paramount in designing and running health studies. However, most health research to date uses data stored in a centralized database, where analysis and model fitting is done with full access to the sensitive underlying data. Recent advances in federated learning enable building complex machine-learned models that are trained in a purely distributed fashion such that private data never leaves the owner’s device but can still contribute to improvement of a global model. Here we show federated models achieve the same level of accuracy, predictive power, generalizability to previously unseen data, and result in the same interpretation as models built in a classical way. We demonstrate this in a side-by-side fashion on a spectrum of open-data health studies ranging from diabetes to heart disease. This work is the first to apply modern federated learning methods to clinical studies and demonstrates how privacy-first studies involving multi-modal data can be run. We make all code used for this research available as open source and all data sets used are publicly available.

