Kallista Bonawitz

Kallista Bonawitz

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Building privacy-preserving systems for machine learning and data science on decentralized data View details
    Preview abstract Secure aggregation is a cryptographic primitive that enables a server to learn the sum of the vector inputs of many clients. Bonawitz et al. (CCS 2017) presented a construction that incurs computation and communication for each client linear in the number of parties. While this functionality enables a broad range of privacy preserving computational tasks, scaling concerns limit its scope of use. We present the first constructions for secure aggregation that achieve polylogarithmic communication and computation per client. Our constructions provide security in the semi-honest and the semi-malicious setting where the adversary controls the server and a γ-fraction of the clients, and correctness with up to δ-fraction dropouts among the clients. Our constructions show how to replace the complete communication graph of Bonawitz et al., which entails the linear overheads, with a k-regular graph of logarithmic degree while maintaining the security guarantees. Beyond improving the known asymptotics for secure aggregation, our constructions also achieve very efficient concrete parameters. The semi-honest secure aggregation can handle a billion clients at the per client cost of the protocol of Bonawitz et al. for a thousand clients. In the semi-malicious setting with 104 clients, each client needs to communicate only with 3% of the clients to have a guarantee that its input has been added together with the inputs of at least 5000 other clients, while withstanding up to 5% corrupt clients and 5% dropouts. We also show an application of secure aggregation to the task of secure shuffling which enables the first cryptographically secure instantiation of the shuffle model of differential privacy. View details
    Context-Aware Local Differential Privacy
    Jayadev Acharya
    Ziteng Sun
    International Conference on Machine Learning (ICML) (2020)
    Preview abstract Local differential privacy (LDP) is a strong notion of privacy for individual users that often comes at the expense of a significant drop in utility. The classical definition of LDP assumes that all elements in the data domain are equally sensitive. However, in many applications, some symbols are more sensitive than others. This work proposes a context-aware framework of local differential privacy that allows a privacy designer to incorporate the application's context into the privacy definition. For binary data domains, we provide a universally optimal privatization scheme and highlight its connections to Warner's randomized response (RR) and Mangat's improved response. Motivated by geolocation and web search applications, for k-ary data domains, we consider two special cases of context-aware LDP: block-structured LDP and high-low LDP. We study discrete distribution estimation and provide communication-efficient, sample-optimal schemes and information-theoretic lower bounds for both models. We show that using contextual information can require fewer samples than classical LDP to achieve the same accuracy. View details
    Federated Learning with Autotuned Communication-Efficient Secure Aggregation
    Fariborz Salehi
    Jakub Konečný
    Marco Gruteser
    Asilomar Conference on Signals, Systems, and Computers (2019)
    Preview abstract Federated Learning enables mobile devices to collaboratively learn a shared inference model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud. Existing work on federated learning with limited communication demonstrates how random rotation can enable users' model updates to be quantized much more efficiently, reducing the communication cost between users and the server. Meanwhile, secure aggregation enable the server to learn an aggregate of at least a threshold number of device's model contributions without observing any individual device's contribution in unaggregated form. In this paper, we highlight some of the challenges of setting the parameters for secure aggregation to achieve communication efficiency, especially in the the context of the aggressively quantized inputs enabled by random rotation. We then develop a recipe for auto-tuning communication-efficient secure aggregation, based on specific properties of random rotation and secure aggregation -- namely, the predictable distribution of vector entries post-rotation and the modular wrapping inherent in secure aggregation. We present both theoretical results and initial experiments. View details
    Advances and Open Problems in Federated Learning
    Brendan Avent
    Aurélien Bellet
    Mehdi Bennis
    Arjun Nitin Bhagoji
    Graham Cormode
    Rachel Cummings
    Rafael G.L. D'Oliveira
    Salim El Rouayheb
    David Evans
    Josh Gardner
    Adrià Gascón
    Phillip B. Gibbons
    Marco Gruteser
    Zaid Harchaoui
    Chaoyang He
    Lie He
    Zhouyuan Huo
    Justin Hsu
    Martin Jaggi
    Tara Javidi
    Gauri Joshi
    Mikhail Khodak
    Jakub Konečný
    Aleksandra Korolova
    Farinaz Koushanfar
    Sanmi Koyejo
    Tancrède Lepoint
    Yang Liu
    Prateek Mittal
    Richard Nock
    Ayfer Özgür
    Rasmus Pagh
    Ramesh Raskar
    Dawn Song
    Weikang Song
    Sebastian U. Stich
    Ziteng Sun
    Florian Tramèr
    Praneeth Vepakomma
    Jianyu Wang
    Li Xiong
    Qiang Yang
    Felix X. Yu
    Han Yu
    Arxiv (2019)
    Preview abstract Federated learning (FL) is a machine learning setting where many clients (e.g., mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g., service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and mitigates many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents a comprehensive list of open problems and challenges. View details
    Towards Federated Learning at Scale: System Design
    Hubert Eichner
    Wolfgang Grieskamp
    Dzmitry Huba
    Vladimir Ivanov
    Chloé M Kiddon
    Jakub Konečný
    Stefano Mazzocchi
    Timon Van Overveldt
    David Petrou
    Jason Roselander
    SysML 2019
    Preview abstract Federated Learning is a distributed machine learning approach which enables training on a large corpus of data which never needs to leave user devices. We have spent some effort over the last two years building a scalable production system for FL. In this paper, we report about the resulting high-level design, sketching the challenges and the solutions, as well as touching the open problems and future directions. View details
    Preview abstract We design a novel, communication-efficient, failure-robust protocol for secure aggregation of high-dimensional data. Our protocol allows a server to collect an aggregate of user-held data from mobile devices in a privacy-preserving manner, and can be used, for example, in a federated learning setting, to aggregate user-provided model updates for a deep neural network. We prove the security of our protocol in the honest-but-curious and malicious server settings, and show that privacy is preserved even if an arbitrarily chosen subset of users drop out at any time. We evaluate the efficiency of our protocol and show, by complexity analysis and a concrete implementation, that its runtime and communication overhead remain low even on large data sets and client pools. For 16-bit input values, our protocol offers 1.73× communication expansion for 2^10 users and 2^20-dimensional vectors, and 1.98× expansion for 2^14 users and 2^24-dimensional vectors. View details
    Practical Secure Aggregation for Federated Learning on User-Held Data
    Vladimir Ivanov
    Ben Kreuter
    Antonio Marcedone
    Sarvar Patel
    NIPS Workshop on Private Multi-Party Machine Learning (2016)
    Preview abstract Secure Aggregation is a class of Secure Multi-Party Computation algorithms wherein a group of mutually distrustful parties u ∈ U each hold a private value x_u and collaborate to compute an aggregate value, such as the sum_{u∈U} x_u, without revealing to one another any information about their private value except what is learnable from the aggregate value itself. In this work, we consider training a deep neural network in the Federated Learning model, using distributed gradient descent across user-held training data on mobile devices, wherein Secure Aggregation protects the privacy of each user’s model gradient. We identify a combination of efficiency and robustness requirements which, to the best of our knowledge, are unmet by existing algorithms in the literature. We proceed to design a novel, communication-efficient Secure Aggregation protocol for high-dimensional data that tolerates up to 1/3 users failing to complete the protocol. For 16-bit input values, our protocol offers 1.73x communication expansion for 2^10 users and 2^20-dimensional vectors, and 1.98x expansion for 2^14 users and 2^24 dimensional vectors. View details
    Preview abstract The collection and analysis of user data drives improvements in the app and web ecosystems, but comes with risks to privacy. This paper examines discrete distribution estimation under local privacy, a setting wherein service providers can learn the distribution of a categorical statistic of interest without collecting the underlying data. We present new mechanisms, including hashed k-ary Randomized Response (k-RR), that empirically meet or exceed the utility of existing mechanisms at all privacy levels. New theoretical results demonstrate the order-optimality of k-RR and the existing RAPPOR mechanism at different privacy regimes. View details