Jump to Content
Uri Stemmer

Uri Stemmer

I am interested in privacy-preserving data analysis, computational learning theory, and algorithms. Typically my research is theoretical. I am also a faculty member at the school of computer science of Tel Aviv University.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract The problem of learning threshold functions is a fundamental one in machine learning. Classical learning theory implies sample complexity of $O(\xi^{-1} \log(1/\beta))$ (for generalization error $\xi$ with confidence $1-\beta$). The private version of the problem, however, is more challenging and in particular, the sample complexity must depend on the size $|X|$ of the domain. Progress on quantifying this dependence, via lower and upper bounds, was made in a line of works over the past decade. In this paper, we finally close the gap for approximate-DP and provide a nearly tight upper bound of $\widetilde{O}(\log^* |X|)$, which matches a lower bound by Alon et al (that applies even with improper learning) and improves over a prior upper bound of $\widetilde{O}((\log^* |X|)^{1.5})$ by Kaplan et al. We also provide matching upper and lower bounds of $\tilde{\Theta}(2^{\log^*|X|})$ for the additive error of private quasi-concave optimization (a related and more general problem). Our improvement is achieved via the novel Reorder-Slice-Compute paradigm for private data analysis which we believe will have further applications. View details
    Private Everlasting Prediction
    Moni Naor
    Kobbi Nissim
    Chao Yan
    NeurIPS 2023
    Preview abstract A private learner is trained on a sample of labeled points and generates a hypothesis that can be used for predicting the labels of newly sampled points while protecting the privacy of the training set [Kasiviswannathan et al., FOCS 2008]. Research uncovered that private learners may need to exhibit significantly higher sample complexity than non-private learners as is the case with, e.g., learning of one-dimensional threshold functions [Bun et al., FOCS 2015, Alon et al., STOC 2019]. We explore prediction as an alternative to learning. Instead of putting forward a hypothesis, a predictor answers a stream of classification queries. Earlier work has considered a private prediction model with just a single classification query [Dwork and Feldman, COLT 2018]. We observe that when answering a stream of queries, a predictor must modify the hypothesis it uses over time, and, furthermore, that it must use the queries for this modification, hence introducing potential privacy risks with respect to the queries themselves. We introduce private everlasting prediction taking into account the privacy of both the training set and the (adaptively chosen) queries made to the predictor. We then present a generic construction of private everlasting predictors in the PAC model. The sample complexity of the initial training sample in our construction is quadratic (up to polylog factors) in the VC dimension of the concept class. Our construction allows prediction for all concept classes with finite VC dimension, and in particular threshold functions with constant size initial training sample, even when considered over infinite domains, whereas it is known that the sample complexity of privately learning threshold functions must grow as a function of the domain size and hence is impossible for infinite domains. View details
    Preview abstract Composition theorems are general and powerful tools that facilitate privacy accounting across multiple data accesses from per-access privacy bounds. However they often result in weaker bounds compared with end-to-end analysis. Two popular tools that mitigate that are the exponential mechanism (or report noisy max) and the sparse vector technique. They were generalized in a couple of recent private selection/test frameworks, including the work by Liu and Talwar (STOC 2019), and Papernot and Steinke (ICLR 2022). In this work, we first present an alternative framework for private selection and testing with a simpler privacy proof and equally-good utility guarantee. Second, we observe that the private selection framework (both previous ones and ours) can be applied to improve the accuracy/confidence trade-off for many fundamental privacy-preserving data-analysis tasks, including query releasing, top-k selection, and stable selection. Finally, for online settings, we apply the private testing to design a mechanism for adaptive query releasing, which improves the sample complexity dependence on the confidence parameter for the celebrated private multiplicative weights algorithm of Hardt and Rothblum (FOCS 2010). View details
    Preview abstract Streaming algorithms are typically analyzed in the oblivious setting, where we assume that the input stream is fixed in advance. Recently, there is a growing interest in designing adversarially robust streaming algorithms that must maintain utility even when the input stream is chosen adaptively and adversarially as the execution progresses. While several fascinating results are known for the adversarial setting, in general, it comes at a very high cost in terms of the required space. Motivated by this, in this work we set out to explore intermediate models that allow us to interpolate between the oblivious and the adversarial models. Specifically, we put forward the following two models: - The bounded interruptions model, in which we assume that the adversary is only partially adaptive. - The advice model, in which the streaming algorithm may occasionally ask for one bit of advice. We present both positive and negative results for each of these two models. In particular, we present generic reductions from each of these models to the oblivious model. This allows us to design robust algorithms with significantly improved space complexity compared to what is known in the plain adversarial model. View details
    Preview abstract In adaptive data analysis, a mechanism gets n i.i.d. samples from an unknown distribution D, and is required to provide accurate estimations to a sequence of adaptively chosen statistical queries with respect to D. Hardt and Ullman [2014] and Steinke and Ullman [2015] showed that, in general, it is computationally hard to answer more than n^2 adaptive queries, assuming the existence of one-way functions. However, these negative results strongly rely on an adversarial model that significantly advantages the adversarial analyst over the mechanism, as the analyst, who chooses the adaptive queries, also chooses the underlying distribution D. This imbalance raises questions with respect to the applicability of the obtained hardness results -- an analyst who has complete knowledge of the underlying distribution D would have little need, if at all, to issue statistical queries to a mechanism which only holds a finite number of samples from D. We consider more restricted adversaries, called balanced, where each such adversary consists of two separated algorithms: The sampler who is the entity that chooses the distribution and provides the samples to the mechanism, and the analyst who chooses the adaptive queries, but does not have a prior knowledge of the underlying distribution. We improve the quality of previous lower bounds by revisiting them using an efficient balanced adversary, under standard public-key cryptography assumptions. We show that these stronger hardness assumptions are unavoidable in the sense that any computationally bounded balanced adversary that has the structure of all known attacks, implies the existence of public-key cryptography. View details
    Preview abstract CountSketch and Feature Hashing (the "hashing trick") are popular randomized dimensionality reduction methods that support recovery of $\ell_2$-heavy hitters (keys $i$ where $v_i^2 > \epsilon \|\boldsymbol{v}\|_2^2$) and approximate inner products. When the inputs are {\em not adaptive} (do not depend on prior outputs), classic estimators applied to a sketch of size $O(\ell/\epsilon)$ are accurate for a number of queries that is exponential in $\ell$. When inputs are adaptive, however, an adversarial input can be constructed after $O(\ell)$ queries with the classic estimator and the best known robust estimator only supports $\tilde{O}(\ell^2)$ queries. In this work we show that this quadratic dependence is in a sense inherent: We design an attack that after $O(\ell^2)$ queries produces an adversarial input vector whose sketch is highly biased. Our attack uses "natural" non-adaptive inputs (only the final adversarial input is chosen adaptively) and universally applies with any correct estimator, including one that is unknown to the attacker. In that, we expose inherent vulnerability of this fundamental method. View details
    Preview abstract We introduce the concurrent shuffle model of differential privacy. In this model we have multiple concurrent shufflers permuting messages from different, possibly overlapping, batches of users. Similarly to the standard (single) shuffle model, the privacy requirement is that the concatenation of all shuffled messages should be differentially private. We study the private continual summation problem (a.k.a. the counter problem) and show that the concurrent shuffle model allows for significantly improved error compared to a standard (single) shuffle model. Specifically, we give a summation algorithm with error $\Tilde{O}(n^{1/(2k+1)})$ with $k$ concurrent shufflers on a sequence of length $n$. Furthermore, we prove that this bound is tight for any $k$, even if the algorithm can choose the sizes of the batches adaptively. For $k=\log n$ shufflers, the resulting error is polylogarithmic, much better than $\Tilde{\Theta}(n^{1/3})$ which we show is the smallest possible with a single shuffler. We use our online summation algorithm to get algorithms with improved regret bounds for the contextual linear bandit problem. In particular we get optimal $\Tilde{O}(\sqrt{n})$ regret with $k= \Tilde{\Omega}(\log n)$ concurrent shufflers. View details
    Preview abstract Classical streaming algorithms operate under the (not always reasonable) assumption that the input stream is fixed in advance. Recently, there is a growing interest in designing robust streaming algorithms that provide provable guarantees even when the input stream is chosen adaptively as the execution progresses. We propose a new framework for robust streaming that combines techniques from two recently suggested frameworks by Hassidim et al. [NeurIPS 2020] and by Woodruff and Zhou [FOCS 2021]. These recently suggested frameworks rely on very different ideas, each with its own strengths and weaknesses. We combine these two frameworks into a single hybrid framework that obtains the "best of both worlds", thereby solving a question left open by Woodruff and Zhou. View details
    Preview abstract In this work we revisit an interactive variant of joint differential privacy, recently introduced by Naor et al. [2023], and generalize it towards handling online processes in which existing privacy definitions seem too restrictive. We study basic properties of this definition and demonstrate that it satisfies (suitable variants) of group privacy, composition, and post processing. In order to demonstrate the advantages of this privacy definition compared to traditional forms of differential privacy, we consider the basic setting of online classification. We show that any (possibly non-private) learning rule can be effectively transformed to a private learning rule with only a polynomial overhead in the mistake bound. This demonstrates a stark difference with traditional forms of differential privacy, such as the one studied by Golowich and Livni [2021], where only a double exponential overhead in the mistake bound is known (via an information theoretic upper bound). View details
    Preview abstract We study the space complexity of the two related fields of differential privacy and adaptive data analysis. Specifically, (1) Under standard cryptographic assumptions, we show that there exists a problem P that requires exponentially more space to be solved efficiently with differential privacy, compared to the space needed without privacy. To the best of our knowledge, this is the first separation between the space complexity of private and non-private algorithms. (2) The line of work on adaptive data analysis focuses on understanding the number of samples needed for answering a sequence of adaptive queries. We revisit previous lower bounds at a foundational level, and show that they are a consequence of a space bottleneck rather than a sampling bottleneck. To obtain our results, we define and construct an encryption scheme with multiple keys that is built to withstand a limited amount of key leakage in a very particular way. View details
    Preview abstract A streaming algorithm is said to be adversarially robust if its accuracy guarantees are maintained even when the data stream is chosen maliciously, by an adaptive adversary. We establish a connection between adversarial robustness of streaming algorithms and the notion of differential privacy. This connection allows us to design new adversarially robust streaming algorithms that outperform the current state-of-the-art constructions for many interesting regimes of parameters. View details
    Preview abstract We present differentially private efficient algorithms for learning polygons in the plane (which are not necessarily convex). Our algorithm achieves $(\alpha,\beta)$-PAC learning and $(\eps,\delta)$-differential privacy using a sample of size $O\left(\frac{k}{\alpha\eps}\log\left(\frac{|X|}{\beta\delta}\right)\right)$, where the domain is $X\times X$ and $k$ is the number of edges in the (potentially non-convex) polygon. View details
    Preview abstract A dynamic algorithm against an adaptive adversary is required to be correct when the adversary chooses the next update after seeing the previous outputs of the algorithm. We obtain faster dynamic algorithms against an adaptive adversary and separation results between what is achievable in the oblivious vs. adaptive settings. To get these results we exploit techniques from differential privacy, cryptography, and adaptive data analysis. We give a general reduction transforming a dynamic algorithm against an oblivious adversary to a dynamic algorithm robust against an adaptive adversary. This reduction maintains several copies of the oblivious algorithm and uses differential privacy to protect their random bits. Using this reduction we obtain dynamic algorithms against an adaptive adversary with improved update and query times for global minimum cut, all pairs distances, and all pairs effective resistance. We further improve our update and query times by showing how to maintain a sparsifier over an expander decomposition that can be refreshed fast. This fast refresh enables it to be robust against what we call a blinking adversary that can observe the output of the algorithm only following refreshes. We believe that these techniques will prove useful for additional problems. On the flip side, we specify dynamic problems that, assuming a random oracle, every dynamic algorithm that solves them against an adaptive adversary must be polynomially slower than a rather straightforward dynamic algorithm that solves them against an oblivious adversary. We first show a separation result for a search problem and then show a separation result for an estimation problem. In the latter case our separation result draws from lower bounds in adaptive data analysis. View details
    Preview abstract \texttt{CountSketch} is a popular dimensionality reduction technique that maps vectors to a lower-dimension using a randomized set of linear measurements. The sketch has the property that the $\ell_2$-heavy hitters of a vector (entries with $v_i^2 \geq \frac{1}{k}\|\boldsymbol{v}\|^2_2$) can be recovered from its sketch. We study the robustness of the sketch in adaptive settings, such as online optimization, where input vectors may depend on the output from prior inputs. We show that the classic estimator can be attacked with a number of queries of the order of the sketch size and propose a robust estimator (for a slightly modified sketch) that allows for quadratic number of queries. We improve robustness by a factor of $\sqrt{k}$ (for $k$ heavy hitters) over prior approaches. View details
    Preview abstract Differentially private algorithms for common metric aggregation tasks, such as clustering or averaging, often have limited practicality due to their complexity or to the large number of data points that is required for accurate results. We propose a simple and practical tool $\mathsf{FriendlyCore}$ that takes a set of points $\cD$ from an unrestricted (pseudo) metric space as input. When $\cD$ has effective diameter $r$, $\mathsf{FriendlyCore}$ returns a ``stable'' subset $\cC \subseteq \cD$ that includes all points, except possibly few outliers, and is {\em guaranteed} to have diameter $r$. $\mathsf{FriendlyCore}$ can be used to preprocess the input before privately aggregating it, potentially simplifying the aggregation or boosting its accuracy. Surprisingly, $\mathsf{FriendlyCore}$ is light-weight with no dependence on the dimension. We empirically demonstrate its advantages in boosting the accuracy of mean estimation and clustering tasks such as $k$-means and $k$-GMM, outperforming tailored methods. View details
    Differentially-Private Bayes Consistency
    Aryeh Kontorovich
    Shay Moran
    Menachem Sadigurschi
    Archive, Archive, Archive
    Preview abstract We construct a universally Bayes consistent learning rule which satisfies differential privacy (DP). We first handle the setting of binary classification and then extend our rule to the more general setting of density estimation (with respect to the total variation metric). The existence of a universally consistent DP learner reveals a stark difference with the distribution-free PAC model. Indeed, in the latter DP learning is extremely limited: even one-dimensional linear classifiers are not privately learnable in this stringent model. Our result thus demonstrates that by allowing the learning rate to depend on the target distribution, one can circumvent the above-mentioned impossibility result and in fact learn \emph{arbitrary} distributions by a single DP algorithm. As an application, we prove that any VC class can be privately learned in a semi-supervised setting with a near-optimal \emph{labelled} sample complexity of $\tilde O(d/\eps)$ labeled examples (and with an unlabeled sample complexity that can depend on the target distribution). View details
    Adaptive Data Analysis with Correlated Observations
    Aryeh Kontorovich
    Menachem Sadigurschi
    ICML 2022
    Preview abstract The vast majority of the work on adaptive data analysis focuses on the case where the samples in the dataset are independent. Several approaches and tools have been successfully applied in this context, such as differential privacy, max-information, compression arguments, and more. The situation is far less well-understood without the independence assumption. We embark on a systematic study of the possibilities of adaptive data analysis with correlated observations. First, we show that, in some cases, differential privacy guarantees generalization even when there are dependencies within the sample, which we quantify using a notion we call Gibbs-dependence. We complement this result with a tight negative example. Second, we show that the connection between transcript-compression and adaptive data analysis can be extended to the non-iid setting. View details
    Preview abstract In this work we study the problem of differentially private (DP) quantiles, in which given data $X$ set and quantiles $q_1, ..., q_m \in [0,1]$, we want to output $m$ quantile estimations such that the estimation is as close as possible to the optimal solution and preserves DP. In this work we provide \algoname~(AQ), an algorithm and implementation for the DP-quantiels problem. We analyze our algorithm and provide a mathematical proof of its error bounds for the general case and for the concrete case of uniform quantiles utility. We also experimentally evaluate \algoref~and conclude that it obtains higher accuracy than the existing baselines while having lower run time. We reduce the problem of DP-data-sanitization to the DP-uniform-quantiles problem and analyze the resulting mathematical bounds for the error in this case. We analyze our algorithm under the definition of zero Concentrated Differential Privacy (zCDP), and supply the error guarantees of our \algoref~in this case. Finally, we show the empirical benefit our algorithm gains under the zCDP definition. View details
    Preview abstract We present efficient differentially private algorithms for learning unions of polygons in the plane (which are not necessarily convex). Our algorithms are $(\alpha,\beta)$--probably approximately correct and $(\varepsilon,\delta)$--differentially private using a sample of size $\tilde{O}\left(\frac{1}{\alpha\varepsilon}k\log d\right)$, where the domain is $[d]\times[d]$ and $k$ is the number of edges in the union of polygons. Our algorithms are obtained by designing a private variant of the classical (nonprivate) learner for conjunctions using the greedy algorithm for set cover. View details
    Preview abstract The amount of training-data is one of the key factors which determines the generalization capacity of learning algorithms. Intuitively, one expects the error rate to decrease as the amount of training-data increases. Perhaps surprisingly, natural attempts to formalize this intuition give rise to interesting and challenging mathematical questions. For example, in their classical book on pattern recognition, Devroye, Gyorfi, and Lugosi (1996) ask whether there exists a monotone Bayes-consistent algorithm. This question remained open for over 25 years, until recently Pestov (2021) resolved it for binary classification, using an intricate construction of a monotone Bayes-consistent algorithm. We derive a general result in multiclass classification, showing that every learning algorithm A can be transformed to a monotone one with similar performance. Further, the transformation is efficient and only uses a black-box oracle access to A. This demonstrates that one can provably avoid non-monotonic behaviour without compromising performance, thus answering questions asked by Devroye, Gyorfi, and Lugosi (1996), Viering, Mey, and Loog (2019), Viering and Loog (2021), and by Mhammedi (2021). Our general transformation readily implies monotone learners in a variety of contexts: for example, Pestov’s result follows by applying it on any Bayes-consistent algorithm (e.g., k-Nearest-Neighbours). In fact, our transformation extends Pestov’s result to classification tasks with an arbitrary number of labels. This is in contrast with Pestov’s work which is tailored to binary classification. In addition, we provide uniform bounds on the error of the monotone algorithm. This makes our transformation applicable in distribution-free settings. For example, in PAC learning it implies that every learnable class admits a monotone PAC learner. This resolves questions asked by Viering, Mey, and Loog (2019); Viering and Loog (2021); Mhammedi (2021). View details
    Preview abstract Clustering is a fundamental problem in data analysis. In differentially private clustering, the goal is to identify k cluster centers without disclosing information on individual data points. Despite significant research progress, the problem had so far resisted practical solutions. In this work we aim at providing simple implementable differentially private clustering algorithms that provide utility when the data is "easy", e.g., when there exists a significant separation between the clusters. We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results. We are able to get improved sample complexity bounds in some cases of Gaussian mixtures and k-means. We complement our theoretical analysis with an empirical evaluation on synthetic data. View details
    Preview abstract We give an $(\eps,\delta)$-differentially private algorithm for the Multi-Armed Bandit (MAB) problem in the shuffle model with a distribution-dependent regret of $O\left(\left(\sum_{a:\Delta_a>0}\frac{\log T}{\Delta_a}\right)+\frac{k\sqrt{\log\frac{1}{\delta}}\log T}{\eps}\right)$, and a distribution-independent regret of $O\left(\sqrt{kT\log T}+\frac{k\sqrt{\log\frac{1}{\delta}}\log T}{\eps}\right)$, where $T$ is the number of rounds, $\Delta_a$ is the suboptimality gap of the action $a$, and $k$ is the total number of actions. Our upper bound almost matches the regret of the best known algorithms for the centralized model, and significantly outperforms the best known algorithm in the local model. View details
    Preview abstract We revisit one of the most basic and widely applicable techniques in the literature of differential privacy -- the sparse vector technique [Dwork et al., STOC 2009]. This simple algorithm privately tests whether the value of a given query on a database is close to what we expect it to be. It allows to ask an unbounded number of queries as long as the answer is close to what we expect, and halts following the first query for which this is not the case. We suggest an alternative, equally simple, algorithm that can continue testing queries as long as any single individual does not contribute to the answer of too many queries whose answer deviates substantially form what we expect. Our analysis is subtle and some of its ingredients may be more widely applicable. In some cases our new algorithm allows to privately extract much more information from the database than the original. We demonstrate this by applying our algorithm to the shifting heavy-hitters problem: On every time step, each of n users gets a new input, and the task is to privately identify all the current heavy-hitters. That is, on time step i, the goal is to identify all data elements x such that many of the users have x as their current input. We present an algorithm for this problem with improved error guarantees over what can be obtained using existing techniques. Specifically, the error of our algorithm depends on the maximal number of times that a single user holds a heavy-hitter as input, rather than the total number of times in which a heavy-hitter exists. View details
    Preview abstract We revisit the fundamental problem of learning Axis-Aligned-Rectangles over a finite grid $X^d$ with differential privacy. Existing results show that the sample complexity of this problem is at most $\min\left\{ d\log|X| , d^{1.5}\left(\log^*|X| \right)^{1.5}\right\}$. That is, existing constructions either require sample complexity that grows linearly with $\log|X|$, or else it grows super linearly with the dimension $d$. We present a novel algorithm that reduces the sample complexity to only $\tildeO{d \left(\log^*|X|\right)^{1.5}}$, attaining a dimensionality optimal dependency without requiring the sample complexity to grow with $\log|X|$. The technique used in order to attain this improvement involves the deletion of ``exposed'' data-points on the go, in a fashion designed to avoid the cost of the adaptive composition theorems. The core of this technique may be of individual interest, introducing a new method for constructing statistically-efficient private algorithms. View details
    Learning and Evaluating a Differentially Private Pre-trained Language Model
    Shlomo Hoory
    Avichai Tendler
    Findings of the Association for Computational Linguistics: EMNLP 2021, Association for Computational Linguistics, Punta Cana, Dominican Republic, pp. 1178-1189
    Preview abstract Contextual language models have led to significantly better results on a plethora of language understanding tasks, especially when pre-trained on the same data as the downstream task. While this additional pre-training usually improves performance, it often leads to information leakage and therefore risks the privacy of individuals mentioned in the training data. One method to guarantee the privacy of such individuals is to train a differentially private model, but this usually comes at the expense of model performance. Moreover, it is hard to tell given a privacy parameter $\epsilon$ what was the effect on the trained representation and whether it maintained relevant information while improving privacy. To improve privacy and guide future practitioners and researchers, we demonstrate here how to train a differentially private pre-trained language model (i.e., BERT) with a privacy guarantee of $\epsilon=0.5$ with only a small degradation in performance. We experiment on a dataset of clinical notes with a model trained on an entity extraction (EE) task on and compare it to a similar model trained without differential privacy. Finally, we present a series of experiments showing how to interpret the differentially private representation and understand the information lost and maintained in this process. View details
    Preview abstract Streaming algorithms are algorithms for processing large data streams, using only a limited amount of memory. Classical streaming algorithms typically work under the assumption that the input stream is chosen independently from the internal state of the algorithm. Algorithms that utilize this assumption are called oblivious algorithms. Recently, there is a growing interest in studying streaming algorithms that maintain utility also when the input stream is chosen by an adaptive adversary, possibly as a function of previous estimates given by the streaming algorithm. Such streaming algorithms are said to be adversarially-robust. By combining techniques from learning theory with cryptographic tools from the bounded storage model, we separate the oblivious streaming model from the adversarially-robust streaming model. Specifically, we present a streaming problem for which every adversarially-robust streaming algorithm must use polynomial space, while there exists a classical (oblivious) streaming algorithm that uses only polylogarithmic space. This is the first general separation between the capabilities of these two models, resolving one of the central open questions in adversarial robust streaming. View details
    Preview abstract Common datasets have the form of elements with keys (e.g., transactions and products) and the goal is to perform analytics on the aggregated form of key and frequency pairs. A weighted sample of keys by (a function of) frequency is a highly versatile summary that provides a sparse set of representative keys and supports approximate evaluations of query statistics. We propose private weighted sampling (PWS): A method that sanitizes a weighted sample as to ensure element-level differential privacy, while retaining its utility to the maximum extent possible. PWS maximizes the reporting probabilities of keys and estimation quality of a broad family of statistics. PWS improves over the state of the art even for the well-studied special case of private histograms, when no sampling is performed. We empirically observe significant performance gains of 20%-300% increase in key reporting for common Zipfian frequency distributions and accurate estimation with X2-8 lower frequencies. PWS is applied as a post-processing of a non-private sample, without requiring the original data. Therefore, it can be a seamless addition to existing implementations, such as those optimizes for distributed or streamed data. We believe that due to practicality and performance, PWS may become a method of choice in applications where privacy is desired. View details
    Preview abstract Let H be a class of boolean functions and consider a composed class H' that is derived from H using some arbitrary aggregation rule (for example, H' may be the class of all 3-wise majority-votes of functions in H). We upper bound the Littlestone dimension of H' in terms of that of H. As a corollary, we derive closure properties for online learning and private PAC learning. The derived bounds on the Littlestone dimension exhibit an undesirable exponential dependence. For private learning, we prove close to optimal bounds that circumvents this suboptimal dependency. The improved bounds on the sample complexity of private learning are derived algorithmically via transforming a private learner for the original class H to a private learner for the composed class H'. Using the same ideas we show that any (proper or improper) private algorithm that learns a class of functions H in the realizable case (i.e., when the examples are labeled by some function in the class) can be transformed to a private algorithm that learns the class H in the agnostic case. View details
    Preview abstract We design a new algorithm for the Euclidean k-means problem that operates in the local model of differential privacy. Unlike in the non-private literature, differentially private algorithms for the k-means incur both additive and multiplicative errors. Our algorithm significantly reduces the additive error while keeping the multiplicative error the same as in previous state-of-the-art results. Specifically, on a database of size n, our algorithm guarantees O(1) multiplicative error and $n^{1/2+a}$ additive error for an arbitrarily small constant a>0. All previous algorithms in the local model had additive error $n^{2/3+a}$. We show that the additive error we obtain is almost optimal in terms of its dependency in the database size n. Specifically, we give a simple lower bound showing that every locally-private algorithm for the k-means must have additive error at least $\sqrt{n}$. View details
    On the Round Complexity of the Shuffle Model
    Amos Beimel
    Iftach Haitner
    Kobbi Nissim
    TCC 2020
    Preview abstract The shuffle model of differential privacy was proposed as a viable model for performing distributed differentially private computations. Informally, the model consists of an untrusted analyzer that receives messages sent by participating parties via a shuffle functionality, the latter potentially disassociates messages from their senders. Prior work focused on one-round differentially private shuffle model protocols, demonstrating that functionalities such as addition and histograms can be performed in this model with accuracy levels similar to that of the curator model of differential privacy, where the computation is performed by a fully trusted party. Focusing on the round complexity of the shuffle model, we ask in this work what can be computed in the shuffle model of differential privacy with two rounds. Ishai et al. [FOCS 2006] showed how to use one round of the shuffle to establish secret keys between every two parties. Using this primitive to simulate a general secure multi-party protocol increases its round complexity by one. We show how two parties can use one round of the shuffle to send secret messages without having to first establish a secret key, hence retaining round complexity. Combining this primitive with the two-round semi-honest protocol of Applebaum et al. [TCC 2018], we obtain that every randomized functionality can be computed in the shuffle model with an honest majority, in merely two rounds. This includes any differentially private computation. We then move to examine differentially private computations in the shuffle model that (i) do not require the assumption of an honest majority, or (ii) do not admit one-round protocols, even with an honest majority. For that, we introduce two computational tasks: the common-element problem and the nested-common-element problem, for which we show separations between one-round and two-round protocols. View details
    Preview abstract We study the question of how to compute a point in the convex hull of an input set $S$ of $n$ points in $R^d$ in a differentially private manner. This question, which is trivial non-privately, turns out to be quite deep when imposing differential privacy. In particular, it is known that the input points must reside on a fixed finite subset $G\subseteq R^d$, and furthermore, the size of $S$ must grow with the size of $G$. Previous works focused on understanding how $n$ needs to grow with $|G|$, and showed that $n=O\left(d^{2.5}\cdot8^{\log^*|G|}\right)$ suffices (so $n$ does not have to grow significantly with $|G|$). However, the available constructions exhibit running time at least $|G|^{d^2}$, where typically $|G|=X^d$ for some (large) discretization parameter $X$, so the running time is in fact $\Omega(X^{d^3})$. In this paper we give a differentially private algorithm that runs in $O(n^d)$ time, assuming that $n=\Omega(d^4\log X)$. To get this result we study and exploit some structural properties of the Tukey levels (the regions $D_{\ge k}$ consisting of points whose Tukey depth is at least $k$, for $k=0,1,...$). In particular, we derive lower bounds on their volumes for point sets $S$ in general position, and develop a rather subtle mechanism for handling point sets $S$ in degenerate position (where the deep Tukey regions have zero volume). A naive approach to the construction of the Tukey regions requires $n^{O(d^2)}$ time. To reduce the cost to $O(n^d)$, we use an approximation scheme for estimating the volumes of the Tukey regions (within their affine spans in case of degeneracy), and for sampling a point from such a region, a scheme that is based on the volume estimation framework of Lovasz and Vempala (FOCS 2003) and of Cousins and Vempala (STOC 2015). Making this framework differentially private raises a set of technical challenges that we address. View details
    Preview abstract We present a private learner for halfspaces over a finite grid $G$ in $R^d$ with sample complexity $d^{2.5}\cdot 2^{\log^*|G|}$, which improves the state-of-the-art result of [Beimel et al., COLT 2019] by a $d^2$ factor. The building block for our learner is a new differentially private algorithm for approximately solving the linear feasibility problem: Given a feasible collection of $m$ linear constraints of the form $Ax\geq b$, the task is to privately identify a solution $x$ that satisfies most of the constraints. Our algorithm is iterative, where each iteration determines the next coordinate of the constructed solution $x$. View details
    Preview abstract We study the sample complexity of learning threshold functions under the constraint of differential privacy. It is assumed that each labeled example in the training data is the information of one individual and we would like to come up with a generalizing hypothesis while guaranteeing differential privacy for the individuals. Intuitively, this means that any single labeled example in the training data should not have a significant effect on the choice of the hypothesis. This problem has received much attention recently; unlike the non-private case, where the sample complexity is independent of the domain size and just depends on the desired accuracy and confidence, for private learning the sample complexity must depend on the domain size X (even for approximate differential privacy). Alon et al. (STOC 2019) showed a lower bound of $\Omega(\log^*|X|)$ on the sample complexity and Bun et al. (FOCS 2015) presented an approximate-private learner with sample complexity $\tilde{O}\left(2^{\log^*|X|}\right)$. In this work we reduce this gap significantly, almost settling the sample complexity. We first present a new upper bound (algorithm) of $\tilde{O}\left(\left(\log^*|X|\right)^2\right)$ on the sample complexity and then present an improved version with sample complexity $\tilde{O}\left(\left(\log^*|X|\right)^{1.5}\right)$. Our algorithm is constructed for the related interior point problem, where the goal is to find a point between the largest and smallest input elements. It is based on selecting an input-dependent hash function and using it to embed the database into a domain whose size is reduced logarithmically; this results in a new database, an interior point of which can be used to generate an interior point in the original database in a differentially private manner. View details
    Preview abstract Motivated by the desire to bridge the utility gap between local and trusted curator models of differential privacy for practical applications, we initiate the theoretical study of a hybrid model introduced by "Blender" [Avent et al. USENIX Security ’17], in which differentially private protocols of n agents that work in the local-model are assisted by a differentially private curator that has access to the data of m additional users. We focus on the regime where m is much smaller than n and study the new capabilities of the interaction in this (m,n)-hybrid model. We show that, despite the fact that the hybrid model adds no new capabilities for the basic task of simple hypothesis-testing, there are many other tasks (under a wide range of parameters) which can be solved in the hybrid model yet cannot be solved either by the curator or by the local-users separately. Moreover, we exhibit additional tasks where at least one round of interaction between the curator and the local-users is necessary - namely, no hybrid model without such interaction can solve these tasks. View details
    Preview abstract A streaming algorithm is said to be adversarially robust if its accuracy guarantees are maintained even when the data stream is chosen maliciously, by an adaptive adversary. We establish a connection between adversarial robustness of streaming algorithms and the notion of differential privacy. This connection allows us to design new adversarially robust streaming algorithms that outperform the current state-of-the-art constructions for many interesting regimes of parameters. View details
    No Results Found