With privacy as a motivation, Federated Learning (FL) is an increasingly used paradigm where learning takes place collectively on edge devices, with user-generated training examples that never leave the device. These on-device training examples are gathered in situ during the course of users’ interactions with their devices, and thus are highly reflective of at least part of the inference data distribution. Yet gaps may still exist, where on-device training examples are lacking for some data inputs expected to be encountered at inference time. This paper proposes a way to mitigate these gaps: selective usage of datacenter data, mixed in with FL. By mixing decentralized (federated) and centralized (datacenter) data, we can form an effective training data distribution that better matches the inference data distribution, resulting in more useful models.