Federated Learning via Posterior Inference: A New Perspective and Practical Algorithms
Abstract
Federated learning is typically approached as a distributed optimization problem, where the goal is to minimize a global loss function by distributing computation across many client devices that possess local data and specify different parts of the global objective. We present an alternative perspective and formulate federated learning as inference of the global posterior distribution over model parameters. While exact inference is often intractable, this perspective provides a consistent way to search for global optima in federated settings. Further, starting with the analysis of federated quadratic objectives, we develop a computation- and communication-efficient approximate posterior inference algorithm---\emph{federated posterior averaging} (\FedPA). Our algorithm uses MCMC for approximate inference of local posteriors on the clients and efficiently communicates their statistics to the server, where the latter uses them to iteratively refine the global estimate of the posterior mode. Finally, we show that \FedPA generalizes federated averaging (\FedAvg), can similarly benefit from adaptive optimizers, and yields state-of-the-art results on four realistic and challenging benchmarks, converging faster, to better optima.