Badih Ghazi
I am a Research Scientist in the Algorithms & Optimization Team at Google. Here's a link to my personal webpage
Authored Publications
Google Publications
Other Publications
Sort By
Preview abstract
In this work, we study the task of estimating the numbers of distinct and k-occurring items in a time window under the constraint of differential privacy (DP). We consider several variants depending on whether the queries are on general time windows (between times t1 and t2), or are restricted to being cumulative (between times 1 and t2), and depending on whether the DP neighboring relation is event-level or the more stringent item-level. We obtain nearly tight upper and lower bounds on the errors of DP algorithms for these problems. En route, we obtain an event-level DP algorithm for estimating, at each time step, the number of distinct items seen over the last W updates with error polylogarithmic in W; this answers an open question of Bolot et al. (ICDT 2013).
View details
Preview abstract
Differential privacy is often applied with a privacy parameter that is larger than the theory suggests is ideal; various informal justifications for tolerating large privacy parameters have been proposed.
In this work, we consider partial differential privacy (DP), which allows quantifying the privacy guarantee on a per-attribute basis.
In this framework, we study several basic data analysis and learning tasks, and design algorithms whose per-attribute privacy parameter is smaller that the best possible privacy parameter for the entire record of a person (i.e., all the attributes).
View details
Differentially Private All-Pairs Shortest Path Distances: Improved Algorithms and Lower Bounds
Jelani Osei Nelson
Justin Y. Chen
Shyam Narayanan
Yinzhan Xu
SODA 2023 (to appear)
Preview abstract
We study the problem of releasing the weights of all-pairs shortest paths in a weighted undirected graph with differential privacy (DP). In this setting, the underlying graph is fixed and two graphs are neighbors if their edge weights differ by at most 1 in the ℓ1-distance. We give an algorithm with additive error ̃O(n^2/3/ε) in the ε-DP case and an algorithm with additive error ̃O(√n/ε) in the (ε, δ)-DP case, where n denotes the number of vertices. This positively answers a question of Sealfon [Sea16, Sea20], who asked whether a o(n) error algorithm exists. We also show that an additive error of Ω(n1/6) is necessary for any sufficiently small ε, δ > 0.
Furthermore, we show that if the graph is promised to have reasonably bounded weights, one can improve the error further to roughly n^{(√17−3)/2+o(1)}/ε in the ε-DP case and roughly n^{√2−1+o(1)}/ε in the (ε, δ)-DP case. Previously, it was only known how to obtain ̃O(n2/3/ε1/3) additive error in the ε-DP case and ̃O(√n/ε) additive error in the (ε, δ)-DP case for bounded-weight graphs [Sea16].
Finally, we consider a relaxation where a multiplicative approximation is allowed. We show that, with a multiplicative approximation factor k, the additive error can be reduced to ̃O(n^{1/2+O(1/k)}/ε) in the ε-DP case and ̃O(n^{1/3+O(1/k)}/ε) in the (ε, δ)-DP case.
View details
Leveraging Bias-Variance Trade-offs for Regression with Label Differential Privacy
Avinash Varadarajan
Chiyuan Zhang
Ethan Leeman
Pritish Kamath
NeurIPS 2023 (2023)
Preview abstract
We propose a new family of label randomization mechanisms for the task of training regression models under the constraint of label differential privacy (DP). In particular, we leverage the trade-offs between bias and variance to construct better noising mechanisms depending on a privately estimated prior distribution over the labels. We demonstrate that these mechanisms achieve state-of-the-art privacy-accuracy trade-offs on several datasets, highlighting the importance of bias-reducing constraints when training neural networks with label DP. We also provide theoretical results shedding light on the structural properties of the optimal bias-reduced mechanisms.
View details
Distributed, Private, Sparse Histograms in the Two-Server Model
Adria Gascon
James Bell
Phillipp Schoppmann
CCS 2022
Preview abstract
We consider the computation of sparse, (ε, ϑ)-differentially private~(DP) histograms in the two-server model of secure multi-party computation~(MPC), which has recently gained traction in the context of privacy-preserving measurements of aggregate user data. We introduce protocols that enable two semi-honest non-colluding servers to compute histograms over the data held by multiple users, while only learning a private view of the data. Our solution achieves the same asymptotic l∞-error of O(log(1/ϑ)/ε) as in the central model of DP, but without relying on a trusted curator. The server communication and computation costs of our protocol are independent of the number of histogram buckets, and are linear in the number of users, while the client cost is independent of the number of users, ε, and ϑ. Its linear dependence on the number of users lets our protocol scale well, which we confirm using microbenchmarks: for a billion users, ε = 0.5, and ϑ = 10-11, the per-user cost of our protocol is only 1.08 ms of server computation and 339 bytes of communication. In contrast, a baseline protocol using garbled circuits only allows up to 106 users, where it requires 600 KB communication per user.
View details
Preview abstract
We study the problem of privately computing the \emph{anonymized histogram} (aka \emph{unattributed histogram}), which is defined as the histogram without item labels. Previous works have provided algorithms with $\ell_1$ and $\ell_2$ errors of $O_\eps(\sqrt{n})$ in the central model of differential privacy (DP).
In this work, we provide an algorithm with a nearly matching error guarantee of $\tilde{O}_\eps(\sqrt{n})$ in the shuffle and pan private DP models. Our algorithm is very simple: it just post-processes the discrete Laplace-noised histogram! Using this algorithm as a subroutine, we show applications in estimating several symmetric properties of distributions such as the entropy and support size.
View details
Preview abstract
The privacy loss distribution (PLD) provides a tight characterization of the privacy loss of a mechanism in the context of differential privacy (DP). Recent work has shown that PLD-based accounting allows for tighter (ε,δ)-DP guarantees for many popular mechanisms compared to other known methods. A key question in PLD-based accounting is how to approximate any (potentially continuous) PLD with a PLD over any specified discrete support.
We present a novel approach to this problem. Our approach supports both pessimistic estimation, which overestimates the hockey-stick divergence (i.e., δ) for any value of ε, and optimistic estimation, which underestimates the hockey-stick divergence. Moreover, we show that our pessimistic estimate is the best possible among all pessimistic estimates. Experimental evaluation shows that our approach can work with much larger discretization intervals while keeping a similar error bound compared to previous approaches and yet give a better approximation than existing methods.
View details
Preview abstract
In this paper we consider the problem of aggregating multiple user-generated tracks in a differentially private manner. For this problem we propose a new aggregation algorithm that adds noise sufficient enough to guarantee privacy while preserving the utility of the aggregate. Under natural and simple assumptions, we also show that this algorithm has provably good guarantees.
View details
Preview abstract
In this paper, we consider the problem of differentially private (DP) algorithms for isotonic regression. For the most general problem of isotonic regression over a partially ordered set (poset) X and for any Lipschitz loss function, we obtain a pure-DP algorithm that, given n input points, has an expected excess empirical risk of roughly width(X)⋅log|X|/n, where width(X) is the width of the poset. In contrast, we also obtain a near-matching lower bound of roughly (width(X)+log|X|)/n, that holds even for approximate-DP algorithms. Moreover, we show that the above bounds are essentially the best that can be obtained without utilizing any further structure of the poset.
In the special case of a totally ordered set and for ℓ1 and ℓ2^2 losses, our algorithm can be implemented in near-linear running time; we also provide extensions of this algorithm to the problem of private isotonic regression with additional structural constraints on the output function.
View details
Preview abstract
In this work, we study the large-scale pretraining of BERT-Large~\citep{devlin2018bert} with differentially private SGD (DP-SGD). We show that combined with a careful implementation, scaling up the batch size to millions (i.e., mega-batches) improves the utility of the DP-SGD step for BERT; we also enhance the training efficiency by using an increasing batch size schedule. Our implementation builds on the recent work of \citet{subramani20}, who demonstrated that the overhead of a DP-SGD step is minimized with effective use of JAX \cite{jax2018github, frostig2018compiling} primitives in conjunction with the XLA compiler \cite{xladocs}. Our implementation achieves a masked language model accuracy of 60.5\% at a batch size of 2M, for $\eps = 5$, which is a reasonable privacy setting. To put this number in perspective, non-private BERT models achieve an accuracy of $\sim$70\%.
View details
Preview abstract
We introduce a new algorithm for numerical composition of privacy random variables, useful for computing the accurate privacy parameters for compositions of mechanisms.
For the task of self-composing a broad class of mechanisms $K$ times, this algorithm achieves a running time \& memory usage of $\polylog(K)$ (e.g., this class includes the sub-sampled Gaussian mechanism, that appears in the analysis of DP-SGD).
By comparison, recent work by Gopi et al. (NeurIPS 2021) has a running time of $\wtilde{O}(\sqrt{K})$ for the same task.
Our approach extends to the case of composing $K$ different mechanisms in the same class, improving upon the running time / memory usage in the work of Gopi et al. from $\wtilde{O}(K^{1.5})$ to $\wtilde{O}(K)$.
View details
On Avoiding the Union Bound When Answering Multiple Differentially Private Queries
Annual Conference on Learning Theory (COLT) (2021), pp. 2133-2146
Preview abstract
In this work, we study the problem of answering k queries with (ε, δ)-differential privacy, where each query has sensitivity one. We give a mechanism for this task that achieves an error bound of O(sqrt(k ln(1/δ))/ε), which is known to be tight (Steinke and Ullman, 2016).
A parallel work by Dagan and Kur (2020) provides a similar result, albeit via a completely different approach. One difference between our work and theirs is that our guarantee holds even when δ < 2^−Ω(k/(log k)^8) whereas theirs does not apply in this case. On the other hand, the algorithm of Dagan and Kur has a remarkable advantage that the ℓ∞ error bound of O(sqrt(k ln(1/δ))/ε) holds not only in expectation but always (i.e., with probability one) while we can only get a high probability (or expected) guarantee on the error.
View details
Near-tight closure bounds for Littlestone and threshold dimensions
Noah Golowich
International Conference on Algorithmic Learning Theory (ALT) (2021), pp. 686-696
Preview abstract
We study closure properties for the Littlestone and threshold dimensions of binary hypothesis classes. Given classes H1,…,Hk of Boolean functions with bounded Littlestone (respectively, threshold) dimension, we establish an upper bound on the Littlestone (respectively, threshold) dimension of the class defined by applying an arbitrary binary aggregation rule to H1,…,Hk. We also show that our upper bounds are nearly tight. Our upper bounds give an exponential (in k) improvement upon analogous bounds shown by Alon et al. (COLT 2020), thus answering a question posed by their work.
View details
Robust and Private Learning of Halfspaces
Thao Nguyen
International Conference on Artificial Intelligence and Statistics (AISTATS) (2021), pp. 1603-1611
Preview abstract
In this work, we study the trade-off between differential privacy and adversarial robustness under L2-perturbations in the context of learning halfspaces. We prove nearly tight bounds on the sample complexity of robust private learning of halfspaces for a large regime of parameters. A highlight of our results is that robust and private learning is harder than robust or private learning alone. We complement our theoretical analysis with experimental results on the MNIST and USPS datasets, for a learning algorithm that is both differentially private and adversarially robust.
View details
Preview abstract
An exciting new development in differential privacy is the shuffled model, in which an anonymous channel enables non-interactive, differentially private protocols with error much smaller than what is possible in the local model, while relying on weaker trust assumptions than in the central model. In this paper, we study basic counting problems in the shuffled model and establish separations between the error that can be achieved in the single-message shuffled model and in the shuffled model with multiple messages per user.
For the problem of frequency estimation for n users and a domain of size B, we obtain:
- A nearly tight lower bound of Ω~(min(n^(1/4), B^(1/2))) on the error in the single-message shuffled model. This implies that the protocols obtained from the amplification via shuffling work of Erlingsson et al. (SODA 2019) and Balle et al. (Crypto 2019) are essentially optimal for single-message protocols. A key ingredient in the proof is a lower bound on the error of locally-private frequency estimation in the low-privacy (aka high ϵ) regime.
- Protocols in the multi-message shuffled model with poly(log B, log n) bits of communication per user and polylog(B) error, which provide an exponential improvement on the error compared to what is possible with single-message algorithms.
For the related selection problem on a domain of size B, we prove:
- A nearly tight lower bound of Ω(B) on the number of users in the single-message shuffled model. This significantly improves on the Ω(B^{1/17}) lower bound obtained by Cheu et al. (Eurocrypt 2019), and when combined with their O~(B^{1/2})-error multi-message protocol, implies the first separation between single-message and multi-message protocols for this problem.
View details
On Distributed Differential Privacy and Counting Distinct Elements
Lijie Chen
Innovations in Theoretical Computer Science (ITCS) (2021), 56:1-56:18
Preview abstract
We study the setup where each of n users holds an element from a discrete set, and the goal is to count the number of distinct elements across all users, under the constraint of (epsilon, delta)-differentially privacy:
- In the local setting, we prove that the additive error of any protocol is Omega(n) for any constant epsilon and any delta inverse polynomial in n. This provides the first separation between global sensitivity and error that is omega(sqrt{n}) in local differential privacy, thus answering a question of Vadhan (2017).
- In the single-message shuffle setting, we prove a lower bound of tilde{Omega}(n) on the error for any constant epsilon and for some delta inverse quasi-polynomial in n. We do so using the moment-matching method from the literature on distribution estimation.
- In the multi-message shuffle setting, we give a protocol with <= 1 message per user in expectation and with an error of tilde{O}(sqrt{n}) for any constant epsilon and delta inverse polynomial in n.
Our proof technique relies on a new notion, that we call dominated protocols, and which can be used to obtain the first non-trivial lower bounds against multi-message shuffle protocols for the well-studied problems of Selection and Parity Learning.
View details
Preview abstract
Most works in learning with differential privacy (DP) have focused on the case where each user has a single sample. In this work, we consider the setting where each user receives $m$ samples and the privacy protection is enforced at the level of each user's data. We show that, in this setting, we may learn with much few number of samples. Specifically, we show that, as long as each user receives sufficiently many samples, we can learn any privately learnable class via an $(\eps, \delta)$-DP algorithm using only $O(\log(1/\delta)/\eps)$ users. For $\eps$-DP algorithms, we show that we can learn using only $O_{\eps}(d)$ users even in the local model, where $d$ is the probabilistic representation dimension. In both cases, we show a nearly-matching lower bound on the number of users required.
A crucial component of our results is a generalization of \emph{global stability}~\cite{BunLM20} that allows the use of public randomness. Under this relaxed notion, we employ a correlated sampling strategy to show that the global stability can be boosted to be arbitrarily close to one, at a polynomial expense in the number of samples.
View details
Locally Private k-Means in One Round
Alisa Chang
International Conference on Machine Learning (ICML) (2021), pp. 1441-1451
Preview abstract
We study k-means clustering in the non-interactive (aka one-round) local model of differential privacy. We give an approximation algorithm that requires a single round of communication and achieves an approximation ratio arbitrarily close to the best non private approximation algorithm. To show the flexibility of our framework, we also demonstrate that it yields a similar near-optimal approximation algorithm in the (one-round) shuffle model.
View details
Sample-efficient proper PAC learning with approximate differential privacy
Noah Golowich
Symposium on Theory of Computing (STOC) (2021), pp. 183-196
Preview abstract
In this paper we prove that the sample complexity of properly learning a class of Littlestone dimension d with approximate differential privacy is Õ(d^6), ignoring privacy and accuracy parameters. This result answers a question of Bun et al. (FOCS 2020) by improving upon their upper bound of 2^O(d) on the sample complexity. Prior to our work, finiteness of the sample complexity for privately learning a class of finite Littlestone dimension was only known for improper private learners, and the fact that our learner is proper answers another question of Bun et al., which was also asked by Bousquet et al. (NeurIPS 2020). Using machinery developed by Bousquet et al., we then show that the sample complexity of sanitizing a binary hypothesis class is at most polynomial in its Littlestone dimension and dual Littlestone dimension. This implies that a class is sanitizable if and only if it has finite Littlestone dimension. An important ingredient of our proofs is a new property of binary hypothesis classes that we call irreducibility, which may be of independent interest.
View details
Google COVID-19 Vaccination Search Insights: Anonymization Process Description
Adam Boulanger
Akim Kumok
Arti Patankar
Benjamin Miller
Chaitanya Kamath
Charlotte Stanton
Chris Scott
Damien Desfontaines
Evgeniy Gabrilovich
Gregory A. Wellenius
John S. Davis
Karen Lee Smith
Krishna Kumar Gadepalli
Mark Young
Shailesh Bavadekar
Tague Griffith
Yael Mayer
Arxiv.org (2021)
Preview abstract
This report describes the aggregation and anonymization process applied to the COVID-19 Vaccination Search Insights~\cite{vaccination}, a publicly available dataset showing aggregated and anonymized trends in Google searches related to COVID-19 vaccination. The applied anonymization techniques protect every user’s daily search activity related to COVID-19 vaccinations with $(\varepsilon, \delta)$-differential privacy for $\varepsilon = 2.19$ and $\delta = 10^{-5}$.
View details
Private Aggregation from Fewer Anonymous Messages
Rasmus Pagh
Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT) (2020), pp. 798-827
Preview abstract
Consider the setup where n parties are each given a number x_i in F_q and the goal is to compute the sum of x_i in a secure fashion and with as little communication as possible. We study this problem in
the anonymized model of Ishai et al. (FOCS 2006) where each party may broadcast anonymous messages on an insecure channel.
We present a new analysis of the one-round "split and mix" protocol of Ishai et al. In order to achieve the same security parameter, our analysis reduces the required number of messages by a Θ(log n) multiplicative factor. We complement our positive result with lower bounds showing that the dependence of the number of messages on the domain size, the number of parties, and the security parameter is essentially tight.
Using a reduction of Balle et al. (2019), our improved analysis of the protocol of Ishai et al. yields, in the same model, an (ε, δ)-differentially private protocol for aggregation that, for any constant ε > 0 and any δ = 1 / poly(n), incurs only a constant error and requires only a constant number of messages per party. Previously, such a protocol was known only for Θ(log n) messages per party.
View details
Private Counting from Anonymous Messages: Near-Optimal Accuracy with Vanishing Communication Overhead
Rasmus Pagh
International Conference on Machine Learning (ICML) (2020), pp. 3505-3514
Preview abstract
Differential privacy (DP) is a formal notion for quantifying the privacy loss of algorithms. Algorithms in the central model of DP achieve high accuracy but make the strongest trust assumptions whereas those in the local DP model make the weakest trust assumptions but incur substantial accuracy loss. The shuffled DP model (Bittau et al., 2017; Erlingsson et al., 2019; Cheu et al.,2019) has recently emerged as a feasible middle ground between the central and local models, providing stronger trust assumptions than the former while promising higher accuracies than the latter. In this paper, we obtain practical communication-efficient algorithms in the shuffled DP model for two basic aggregation primitives used in machine learning: 1) binary summation, and 2) histograms over a moderate number of buckets. Our algorithms achieve accuracy that is arbitrarily close to that of central DP algorithms with an expected communication per user essentially matching what is needed without any privacy constraints! We demonstrate the practicality of our algorithms by experimentally comparing their performance to several widely-used protocols such as Randomized Response (Warner, 1965) and RAPPOR (Erlingsson et al., 2014).
View details
Pure Differentially Private Summation from Anonymous Messages
Noah Golowich
Rasmus Pagh
Information Theoretic Cryptography (ITC) (2020), 15:1-15:23
Preview abstract
The shuffled (aka anonymous) model has recently generated significant interest as a candidate dis- tributed privacy framework with trust assumptions better than the central model but with achievable error rates smaller than the local model. In this paper, we study pure differentially private protocols in the shuffled model for summation, a very basic and widely used primitive. Specifically:
• For the binary summation problem where each of n users holds a bit as an input, we give a pure ε-
differentially private protocol for estimating the number of ones held by the users up to an absolute
error of Oε(1), and where each user sends Oε(logn) messages each consisting of a single bit. This √
is the first pure differentially private protocol in the shuffled model with error o( n) for constant values of ε.
Using our binary summation protocol as a building block, we give a pure ε-differentially private protocol that performs summation of real numbers (in [0,1]) up to an absolute error of Oε(1), and where each user sends Oε(log3 n) messages each consisting of O(loglogn) bits.
• In contrast, we show that for any pure ε-differentially private protocol for binary summation in the shuffled model having absolute error n0.5−Ω(1), the per user communication has to be at least
Ωε( log n) bits. This implies (i) the first separation between the (bounded-communication) multi- message shuffled model and the central model, and (ii) the first separation between pure and approximate differentially private protocols in the shuffled model.
Interestingly, over the course of proving our lower bound, we have to consider (a generalization of) the following question which might be of independent interest: given γ ∈ (0, 1), what is the smallest positive integer m for which there exist two random variables X0 and X1 supported on {0, . . . , m} such that (i) the total variation distance between X0 and X1 is at least 1 − γ, and (ii) the moment generating functions of X0 and X1 are within a constant factor of each other everywhere? We show that the answer to this question is m = Θ(
View details
Differentially Private Clustering: Tight Approximation Ratios
Advances in Neural Information Processing Systems (NeurIPS) (2020)
Preview abstract
We study the task of differentially private clustering. For several basic clustering problems, including Euclidean DensestBall, 1-Cluster, k-means, and k-median, we give efficient differentially private algorithms that achieve essentially the same approximation ratios as those that can be obtained by any non-private algorithm, while incurring only small additive errors. This improves upon existing efficient algorithms that only achieve some large constant approximation factors.
Our results also imply an improved algorithm for the Sample and Aggregate privacy framework. Furthermore, we show that one of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
View details
Advances and Open Problems in Federated Learning
Brendan Avent
Aurélien Bellet
Mehdi Bennis
Arjun Nitin Bhagoji
Graham Cormode
Rachel Cummings
Rafael G.L. D'Oliveira
Salim El Rouayheb
David Evans
Josh Gardner
Adrià Gascón
Phillip B. Gibbons
Marco Gruteser
Zaid Harchaoui
Chaoyang He
Lie He
Zhouyuan Huo
Justin Hsu
Martin Jaggi
Tara Javidi
Gauri Joshi
Mikhail Khodak
Jakub Konečný
Aleksandra Korolova
Farinaz Koushanfar
Sanmi Koyejo
Tancrède Lepoint
Yang Liu
Prateek Mittal
Richard Nock
Ayfer Özgür
Rasmus Pagh
Ramesh Raskar
Dawn Song
Weikang Song
Sebastian U. Stich
Ziteng Sun
Florian Tramèr
Praneeth Vepakomma
Jianyu Wang
Li Xiong
Qiang Yang
Felix X. Yu
Han Yu
Arxiv (2019)
Preview abstract
Federated learning (FL) is a machine learning setting where many clients (e.g., mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g., service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and mitigates many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents a comprehensive list of open problems and challenges.
View details
No Results Found