Jointly Learning from Decentralized (Federated) and Centralized Data to Mitigate Distribution Shift

Andrew Hard; Kurt Partridge; Rajiv Mathews; Sean Augenstein

Jointly Learning from Decentralized (Federated) and Centralized Data to Mitigate Distribution Shift

Andrew Hard

Kurt Partridge

Rajiv Mathews

Sean Augenstein

NeurIPS 2021 Workshop on Distribution Shifts (2021) (to appear)

Google Scholar

Abstract

With privacy as a motivation, Federated Learning (FL) is an increasingly used paradigm where learning takes place collectively on edge devices, with user-generated training examples that never leave the device. These on-device training examples are gathered in situ during the course of users’ interactions with their devices, and thus are highly reflective of at least part of the inference data distribution. Yet gaps may still exist, where on-device training examples are lacking for some data inputs expected to be encountered at inference time. This paper proposes a way to mitigate these gaps: selective usage of datacenter data, mixed in with FL. By mixing decentralized (federated) and centralized (datacenter) data, we can form an effective training data distribution that better matches the inference data distribution, resulting in more useful models.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Jointly Learning from Decentralized (Federated) and Centralized Data to Mitigate Distribution Shift

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs