
Zachary Garrett
Authored Publications
Sort By
Leveraging Function Space Aggregation for Federated Learning at Scale
Karolina Dziugaite
Nikita Dhawan
Transactions on Machine Learning Research (2024)
Preview abstract
The federated learning paradigm has motivated the development of methods for aggregating multiple client updates into a global server model, without sharing client data. Many federated learning algorithms, including the canonical Federated Averaging (FedAvg), take a direct (possibly weighted) average of the client parameter updates, motivated by results in distributed optimization. In this work, we adopt a function space perspective and propose a new algorithm, FedFish, that aggregates local approximations to the functions learned by clients, using an estimate based on their Fisher information. We evaluate FedFish on realistic, large-scale cross-device benchmarks. While the performance of FedAvg can suffer as client models drift further apart, we demonstrate that FedFish is more robust to longer local training. Our evaluation across several settings in image and language benchmarks shows that FedFish outperforms FedAvg as local training epochs increase. Further, FedFish results in global networks that are more amenable to efficient personalization via local fine-tuning on the same or shifted data distributions. For instance, federated pretraining on the C4 dataset, followed by few-shot personalization on Stack Overflow, results in a 7% improvement in next-token prediction by FedFish over FedAvg.
View details
Federated Automatic Differentiation
Journal of Machine Learning Research (JMLR), 25 (2024), pp. 1-39
Preview abstract
Federated learning (FL) is a framework for learning across an axis of group partitioned data (heterogeneous clients) while preserving data privacy, under the orchestration of a central server. FL methods often compute gradients of loss functions purely locally (e.g. at each client), typically using automatic differentiation (AD) techniques. In this work, we consider the problem of applying AD to federated computations while preserving compatibility with privacy-enhancing technologies. We propose a framework, federated automatic differentiation (federated AD), that 1) enables computing derivatives of functions involving client and server computation as well as communication between them and 2) operates in a manner compatible with existing federated technology. We show, in analogy with AD, that federated AD may be implemented using various accumulation modes, which introduce distinct computation-communication trade-offs and systems requirements. Further, we show that a broad class of federated computations is closed under these modes of federated AD, implying that if the original computation can be implemented using privacy-preserving primitives, its derivative may be computed using the same primitives. We then show how federated AD can be used to create algorithms that dynamically learn components of the algorithm itself. We demonstrate that performance of FedAvg-style algorithms can be significantly improved by using federated AD in this manner.
View details
Towards Federated Foundation Models: Scalable Dataset Pipelines for Group-Structured Learning
Krishna Pillutla
Michael Reneer
37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks (2023)
Preview abstract
We introduce Dataset Grouper, a library to create large-scale group-structured (e.g., federated) datasets, enabling federated learning simulation at the scale of foundation models. This library facilitates the creation of group-structured versions of existing datasets based on user-specified partitions and directly leads to a variety of useful heterogeneous datasets that can be plugged into existing software frameworks. Dataset Grouper offers three key advantages. First, it scales to settings where even a single group's dataset is too large to fit in memory. Second, it provides flexibility, both in choosing the base (non-partitioned) dataset and in defining partitions. Finally, it is framework-agnostic. We empirically demonstrate that Dataset Grouper enables large-scale federated language modeling simulations on datasets that are orders of magnitude larger than in previous work, allowing for federated training of language models with hundreds of millions, and even billions, of parameters. Our experimental results show that algorithms like FedAvg operate more as meta-learning methods than as empirical risk minimization methods at this scale, suggesting their utility in downstream personalization and task-specific adaptation. Dataset Grouper is available at https://github.com/google-research/dataset_grouper.
View details
Preview abstract
Model sizes are limited in Federated Learning due to communication bandwidth constraints and on-device memory constraints. The success of scaling model sizes in other machine learning domains, especially when it comes to generalizing to new data distributions, motivates the development of methods of training large scale models in Federated Learning. Inspired by dropout, [3] proposed Federated Dropout as a way of scaling up model sizes: clients train randomly selected subsets of the larger server model. In spite of the promising empirical results and the many other works that build on it [1, 8, 13], we argue in this paper that the metrics used to measure performance of Federated Dropout and its variants are misleading. We propose and perform new experiments which suggest that Federated Dropout is actually detrimental to scaling efforts. We show how a simple ensembling technique outperforms Federated Dropout and other baselines. We perform ablations which suggest that the best performing variations of Federated Dropout attempt to approximate ensembling. The simplicity of ensembling allows for easy, practical implementations. Furthermore, our ensembling technique naturally leverages the parallelizable nature of Federated Learning—recall that it is easy to train several models independently because there are a lot of clients and server-compute is not the bottleneck. Ensembling’s strong performance against our baselines suggests that Federated Learning models may be more easily scaled than previously thought e.g., via boosting.
View details
A Field Guide to Federated Optimization
Suhas Diggavi
Chaoyang He
Mahdi Soltanolkotabi
Maruan Al-Shedivat
Chen Zhu
Peter Richtarik
Honglin Yuan
Ameet Talwalkar
Sebastian Stich
Sanmi Koyejo
Hongyi Wang
Deepesh Data
Blake Woodworth
Filip Hanzely
A. Salman Avestimehr
Tian Li
Jianyu Wang
Samuel Horvath
Antonious M. Girgis
Mi Zhang
Advait Gadhikar
Martin Jaggi
Gauri Joshi
Tara Javidi
Virginia Smith
Sai Praneeth Karimireddy
Karan Singhal
Jakub Konečný
Manzil Zaheer
Satyen Chandrakant Kale
Chunxiang (Jake) Zheng
Weikang Song
Galen Andrew
Katharine Daly
Tong Zhang
Hubert Eichner
arxiv (2021)
Preview abstract
Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and other constraints that are not primary considerations in other problem settings. This paper provides recommendations and guidelines on formulating, designing, evaluating and analyzing federated optimization algorithms through concrete examples and practical implementation, with a focus on conducting effective simulations to infer real-world performance. The goal of this work is not to survey the current literature, but to inspire researchers and practitioners to design federated learning algorithms that can be used in various practical applications.
View details
Preview abstract
The federated learning (FL) framework trains a machine learning model using decentralized data stored at edge client devices by periodically aggregating locally trained models. Popular optimization algorithms of FL use vanilla (stochastic) gradient descent for both local updates at clients and global updates at the aggregating server. Recently, adaptive optimization methods such as AdaGrad have been studied for server updates. However, the effect of using adaptive optimization methods for local updates at clients is not yet understood. We show in both theory and practice that while local adaptive methods can accelerate convergence, they can cause a non-vanishing solution bias, where the final converged solution may be different from the stationary point of the global objective function. We propose correction techniques to overcome this inconsistency and complement the local adaptive methods for FL. Extensive experiments on realistic federated training tasks show that the proposed algorithms can achieve faster convergence and higher test accuracy than the baselines without local adaptivity.
View details
Adaptive Federated Optimization
Jakub Konečný
Manzil Zaheer
(2021)
Preview abstract
Federated learning is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data. Due to the heterogeneity of the client datasets, standard federated optimization methods such as Federated Averaging (FedAvg) are often difficult to tune and exhibit unfavorable convergence behavior. In non-federated settings, adaptive optimization methods have had notable success in combating such issues. In this work, we propose federated versions of adaptive optimizers, including Adagrad, Yogi and Adam, and analyze their convergence in the presence of heterogeneous data for general nonconvex settings. Our results highlight the interplay between client heterogeneity and communication efficiency. We also perform extensive experiments on these methods and show that the use of adaptive optimizers can improve the performance of federated learning.
View details
Preview abstract
Personalization methods in federated learning aim to balance the benefits of federated and local training for data availability, communication cost, and robustness to client heterogeneity. Approaches that require clients to communicate all model parameters can be undesirable due to privacy and communication constraints. Other approaches require always-available or stateful clients, impractical in large-scale cross-device settings. We introduce Federated Reconstruction, the first model-agnostic framework for partially local federated learning suitable for training and inference at scale. We motivate the framework via a connection to model-agnostic meta learning, empirically demonstrate its performance over existing approaches for collaborative filtering and next word prediction, and release an open-source library for evaluating approaches in this setting. We also describe the successful deployment of this approach at scale for federated collaborative filtering in a mobile keyboard application.
View details
On Large-Cohort Training for Federated Learning
Virginia Smith
Sergei Shmulyian
Advances in Neural Information Processing Systems (2021)
Preview abstract
Federated learning methods typically learn a model by iteratively sampling updates from a population of clients. In this work, we explore how the number of clients sampled at each round (the cohort size) impacts the quality of the learned model and the training dynamics of federated learning algorithms. Our work poses three fundamental questions. First, what challenges arise when trying to scale federated learning to larger cohorts? Second, what parallels exist between cohort sizes in federated learning and batch sizes in centralized learning? Last, how can we design federated learning methods that effectively utilize larger cohort sizes? We give partial answers to these questions based on extensive empirical evaluation. Our work highlights a number of challenges stemming from the use of larger cohorts. While some of these (such as generalization issues and diminishing returns) are analogs of large-batch training challenges, others (including training failures and fairness concerns) are unique to federated learning.
View details
Advances and Open Problems in Federated Learning
Zaid Harchaoui
Zhouyuan Huo
Justin Hsu
Dawn Song
Mehdi Bennis
Aleksandra Korolova
Prateek Mittal
Lie He
Phillip B. Gibbons
Gauri Joshi
Graham Cormode
Rafael G.L. D'Oliveira
Felix X. Yu
Salim El Rouayheb
Sebastian U. Stich
Josh Gardner
Jianyu Wang
Brendan Avent
Qiang Yang
Han Yu
Arjun Nitin Bhagoji
Aurélien Bellet
Ayfer Özgür
Sanmi Koyejo
Florian Tramèr
Farinaz Koushanfar
Li Xiong
Ramesh Raskar
David Evans
Praneeth Vepakomma
Tara Javidi
Chaoyang He
Mikhail Khodak
Martin Jaggi
Yang Liu
Richard Nock
Ziteng Sun
Rachel Cummings
Jakub Konečný
Rasmus Pagh
Tancrède Lepoint
Marco Gruteser
Weikang Song
Adrià Gascón
Arxiv (2019)
Preview abstract
Federated learning (FL) is a machine learning setting where many clients (e.g., mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g., service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and mitigates many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents a comprehensive list of open problems and challenges.
View details