Jump to Content
Edith Cohen

Edith Cohen

I am a Research Scientist at Google (Mountain View, CA) since 2015. I am also a (visiting) full professor at the School of Computer Science at Tel Aviv University in Israel. I attended Tel Aviv University (Israel) for my undergraduate studies in math, physics, and computer science, graduating in 1985, and continued to obtain an M.Sc. in computer science in 1986, supervised by Michael Tarsi. I then headed to Stanford, CA, where I worked on a Ph.D. in Computer Science, with Andrew Goldberg and Nimrod Megiddo, graduating in 1991. From 1991 to 2012, I was a member of the research staff, first at the legendary AT&T Bell Laboratories , and after a 1997 split, at AT&T Labs Research. In 1997 I also visited the Computer Science Division at UC Berkeley. From 2012 to 2014 I was a principal researcher at Microsoft Research (Silicon Valley).

My research interests include Algorithms design, Data Mining, Machine Learning, Optimization, Networks, and more. I prefer a principled approach to modeling and design and I am always looking to learn and expand my horizons. In the span of my research career, I developed models and scalable algorithms in a wide range of areas including query processing and optimization, data summarization, content search and delivery, caching, prefetching, routing, streaming and distributed computation, fundamental graph algorithms and scalable mining of massive graphs. My work enables the leveraging of much larger data sets than previously possible. More details on my work can be found on my personal web page.

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Preview abstract The problem of learning threshold functions is a fundamental one in machine learning. Classical learning theory implies sample complexity of $O(\xi^{-1} \log(1/\beta))$ (for generalization error $\xi$ with confidence $1-\beta$). The private version of the problem, however, is more challenging and in particular, the sample complexity must depend on the size $|X|$ of the domain. Progress on quantifying this dependence, via lower and upper bounds, was made in a line of works over the past decade. In this paper, we finally close the gap for approximate-DP and provide a nearly tight upper bound of $\widetilde{O}(\log^* |X|)$, which matches a lower bound by Alon et al (that applies even with improper learning) and improves over a prior upper bound of $\widetilde{O}((\log^* |X|)^{1.5})$ by Kaplan et al. We also provide matching upper and lower bounds of $\tilde{\Theta}(2^{\log^*|X|})$ for the additive error of private quasi-concave optimization (a related and more general problem). Our improvement is achieved via the novel Reorder-Slice-Compute paradigm for private data analysis which we believe will have further applications. View details
    Preview abstract Composition theorems are general and powerful tools that facilitate privacy accounting across multiple data accesses from per-access privacy bounds. However they often result in weaker bounds compared with end-to-end analysis. Two popular tools that mitigate that are the exponential mechanism (or report noisy max) and the sparse vector technique. They were generalized in a couple of recent private selection/test frameworks, including the work by Liu and Talwar (STOC 2019), and Papernot and Steinke (ICLR 2022). In this work, we first present an alternative framework for private selection and testing with a simpler privacy proof and equally-good utility guarantee. Second, we observe that the private selection framework (both previous ones and ours) can be applied to improve the accuracy/confidence trade-off for many fundamental privacy-preserving data-analysis tasks, including query releasing, top-k selection, and stable selection. Finally, for online settings, we apply the private testing to design a mechanism for adaptive query releasing, which improves the sample complexity dependence on the confidence parameter for the celebrated private multiplicative weights algorithm of Hardt and Rothblum (FOCS 2010). View details
    Preview abstract Classical streaming algorithms operate under the (not always reasonable) assumption that the input stream is fixed in advance. Recently, there is a growing interest in designing robust streaming algorithms that provide provable guarantees even when the input stream is chosen adaptively as the execution progresses. We propose a new framework for robust streaming that combines techniques from two recently suggested frameworks by Hassidim et al. [NeurIPS 2020] and by Woodruff and Zhou [FOCS 2021]. These recently suggested frameworks rely on very different ideas, each with its own strengths and weaknesses. We combine these two frameworks into a single hybrid framework that obtains the "best of both worlds", thereby solving a question left open by Woodruff and Zhou. View details
    Preview abstract CountSketch and Feature Hashing (the "hashing trick") are popular randomized dimensionality reduction methods that support recovery of $\ell_2$-heavy hitters (keys $i$ where $v_i^2 > \epsilon \|\boldsymbol{v}\|_2^2$) and approximate inner products. When the inputs are {\em not adaptive} (do not depend on prior outputs), classic estimators applied to a sketch of size $O(\ell/\epsilon)$ are accurate for a number of queries that is exponential in $\ell$. When inputs are adaptive, however, an adversarial input can be constructed after $O(\ell)$ queries with the classic estimator and the best known robust estimator only supports $\tilde{O}(\ell^2)$ queries. In this work we show that this quadratic dependence is in a sense inherent: We design an attack that after $O(\ell^2)$ queries produces an adversarial input vector whose sketch is highly biased. Our attack uses "natural" non-adaptive inputs (only the final adversarial input is chosen adaptively) and universally applies with any correct estimator, including one that is unknown to the attacker. In that, we expose inherent vulnerability of this fundamental method. View details
    Preview abstract \texttt{CountSketch} is a popular dimensionality reduction technique that maps vectors to a lower-dimension using a randomized set of linear measurements. The sketch has the property that the $\ell_2$-heavy hitters of a vector (entries with $v_i^2 \geq \frac{1}{k}\|\boldsymbol{v}\|^2_2$) can be recovered from its sketch. We study the robustness of the sketch in adaptive settings, such as online optimization, where input vectors may depend on the output from prior inputs. We show that the classic estimator can be attacked with a number of queries of the order of the sketch size and propose a robust estimator (for a slightly modified sketch) that allows for quadratic number of queries. We improve robustness by a factor of $\sqrt{k}$ (for $k$ heavy hitters) over prior approaches. View details
    Preview abstract Differentially private algorithms for common metric aggregation tasks, such as clustering or averaging, often have limited practicality due to their complexity or to the large number of data points that is required for accurate results. We propose a simple and practical tool $\mathsf{FriendlyCore}$ that takes a set of points $\cD$ from an unrestricted (pseudo) metric space as input. When $\cD$ has effective diameter $r$, $\mathsf{FriendlyCore}$ returns a ``stable'' subset $\cC \subseteq \cD$ that includes all points, except possibly few outliers, and is {\em guaranteed} to have diameter $r$. $\mathsf{FriendlyCore}$ can be used to preprocess the input before privately aggregating it, potentially simplifying the aggregation or boosting its accuracy. Surprisingly, $\mathsf{FriendlyCore}$ is light-weight with no dependence on the dimension. We empirically demonstrate its advantages in boosting the accuracy of mean estimation and clustering tasks such as $k$-means and $k$-GMM, outperforming tailored methods. View details
    Preview abstract Common datasets have the form of elements with keys (e.g., transactions and products) and the goal is to perform analytics on the aggregated form of key and frequency pairs. A weighted sample of keys by (a function of) frequency is a highly versatile summary that provides a sparse set of representative keys and supports approximate evaluations of query statistics. We propose private weighted sampling (PWS): A method that sanitizes a weighted sample as to ensure element-level differential privacy, while retaining its utility to the maximum extent possible. PWS maximizes the reporting probabilities of keys and estimation quality of a broad family of statistics. PWS improves over the state of the art even for the well-studied special case of private histograms, when no sampling is performed. We empirically observe significant performance gains of 20%-300% increase in key reporting for common Zipfian frequency distributions and accurate estimation with X2-8 lower frequencies. PWS is applied as a post-processing of a non-private sample, without requiring the original data. Therefore, it can be a seamless addition to existing implementations, such as those optimizes for distributed or streamed data. We believe that due to practicality and performance, PWS may become a method of choice in applications where privacy is desired. View details
    Preview abstract Clustering is a fundamental problem in data analysis. In differentially private clustering, the goal is to identify k cluster centers without disclosing information on individual data points. Despite significant research progress, the problem had so far resisted practical solutions. In this work we aim at providing simple implementable differentially private clustering algorithms that provide utility when the data is "easy", e.g., when there exists a significant separation between the clusters. We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results. We are able to get improved sample complexity bounds in some cases of Gaussian mixtures and k-means. We complement our theoretical analysis with an empirical evaluation on synthetic data. View details
    Preview abstract Optimization of a machine learning model is typically carried out by performing stochastic gradient updates on epochs that consist of randomly ordered training examples. This practice means that eachfraction of an epoch comprises an independent random sample of the training data that may not preserve informative structure present in the full data. We hypothesize that the training can be more effective, allowing each epoch to provide some of the benefits of multiple ones, with more principled, ``self-similar'' arrangements. Our case study is matrix factorization, commonly used to learn metric embeddings of entities such as videos or words from example associations. We construct arrangements that preserve the weighted Jaccard similarities of rows and columns and experimentally observe that our arrangements yield training acceleration of 3\%-30\% on synthetic and recommendation datasets. Principled arrangements of training examples emerge as a novel and potentially powerful performance knob for SGD that merits further exploration. View details
    Preview abstract Graph-based semi-supervised learning (SSL) algorithms predict labels for all nodes based on provided labels of a small set of seed nodes. Classic methods capture the graph structure through some underlying diffusion process that propagates through the graph edges. Spectral diffusion, which includes personalized page rank and label propagation can be formulated as repeated weighted averaging of the label vectors of adjacent nodes. Social diffusion propagates through shortest paths. A common ground to these diffusions is their {\em linearity}, which does not distinguish between contributions of few ``strong'' relations and many ``weak'' relations. Recently, non-linear methods such as node embeddings and graph convolutional networks (GCN) demonstrated a large gain in quality for SSL tasks. These methods introduce multiple components and greatly vary on how the graph structure, seed label information, and other features are used. We aim here to study the contribution of non-linearity, as an isolated ingredient, to the performance gain. To do so, we place classic linear graph diffusions in a self-training framework. Surprisingly, we observe that the resulting {\em bootstrapped diffusions} not only significantly improve over the respective non-bootstrapped baselines but also outperform state-of-the-art non-linear methods. Moreover, since the self-training wrapper retains the scalability of the base method, we obtain both higher quality and better scalability. View details
    Preview abstract Clustering of data points is a fundamental tool in data analysis. We consider points $X$ in a relaxed metric space, where the triangle inequality holds within a constant factor. A clustering of $X$ is a partition of $X$ defined by a set of points $Q$ ({\em centroids}), according to the closest centroid. The {\em cost} of clustering $X$ by $Q$ is $V(Q)=\sum_{x\in X} d_{xQ}$. This formulation generalizes classic $k$-means clustering, which uses squared distances. Two basic tasks, parametrized by $k \geq 1$, are {\em cost estimation}, which returns (approximate) $V(Q)$ for queries $Q$ such that $|Q|=k$ and {\em clustering}, which returns an (approximate) minimizer of $V(Q)$ of size $|Q|=k$. When the data set $X$ is very large, we seek efficient constructions of small samples that can act as surrogates for performing these tasks. Existing constructions that provide quality guarantees, however, are either worst-case, and unable to benefit from structure of real data sets, or make explicit strong assumptions on the structure. We show here how to avoid both these pitfalls using adaptive designs. The core of our design are the novel {\em one2all} probabilities, computed for a set $M$ of centroids and $\alpha \geq 1$: The clustering cost of {\em each} $Q$ with cost $V(Q) \geq V(M)/\alpha$ can be estimated well from a sample of size $O(\alpha |M|\epsilon^{-2})$. For cost estimation, we apply one2all with a bicriteria approximate $M$, while adaptively balancing $|M|$ and $\alpha$ to optimize sample size per quality. For clustering, we present a wrapper that adaptively applies a base clustering algorithm to a sample $S$, using the smallest sample that provides the desired statistical guarantees on quality. We demonstrate experimentally the huge gains of using our adaptive instead of worst-case methods. View details
    Preview abstract One of the most common statistics computed over data elements is the number of distinct keys. A thread of research pioneered by Flajolet and Martin three decades ago culminated in the design of optimal approximate counting sketches, which have size that is double logarithmic in the number of distinct keys and provide estimates with a small relative error. Moreover, the sketches are composable, and thus suitable for streamed, parallel, or distributed computation. We consider here all statistics of the frequency distribution of keys, where a contribution of a key to the aggregate is concave and grows (sub)linearly with its frequency. These fundamental aggregations are very common in text, graphs, and logs analysis and include logarithms, low frequency moments, and capping statistics. We design composable sketches of double-logarithmic size for all concave sublinear statistics. Our design combines theoretical optimality and practical simplicity. In a nutshell, we specify tailored mapping functions of data elements to output elements so that our target statistics on the data elements is approximated by the (max-) distinct statistics of the output elements, which can be approximated using off-the-shelf sketches. Our key insight is relating these target statistics to the {\em complement Laplace} transform of the input frequencies. View details
    Preview abstract The study of graph-based submodular maximization problems was initiated in a seminal work of Kempe, Kleinberg, and Tardos (2003): An {\em influence} function of subsets of nodes is defined by the graph structure and the aim is to find subsets of seed nodes with (approximately) optimal tradeoff of size and influence. Applications include viral marketing, monitoring, and active learning of node labels. This powerful formulation was studied for (generalized) {\em coverage} functions, where the influence of a seed set on a node is the maximum utility of a seed item to the node, and for pairwise {\em utility} based on reachability, distances, or reverse ranks. We define a rich class of influence functions which unifies and extends previous work beyond coverage functions and specific utility functions. We present a meta-algorithm for approximate greedy maximization with strong approximation quality guarantees and worst-case near-linear computation for all functions in our class. Our meta-algorithm generalizes a recent design by Cohen et al (2014) that was specific for distance-based coverage functions. View details
    Preview abstract Distances in a network capture relations between nodes and are the basis of centrality, similarity, and influence measures.Often, however, the relevance of a node $u$ to a node $v$ is more precisely measured not by the magnitude of the distance, but by the number of nodes that are closer to $v$ than $u$. That is, by the {\em rank} of $u$ in an ordering of nodes by increasing distance from $v$. We identify and address fundamental challenges in rank-based graph mining. We first consider single-source computation of reverse-ranks and design a ``Dijkstra-like'' algorithm which computes nodes in order of increasing approximate reverse rank while only traversing edges adjacent to returned nodes. We then define {\em reverse-rank influence}, which naturally extends reverse nearest neighbors influence [Korn and Muthukrishnan 2000] and builds on a well studied distance-based influence. We present near-linear algorithms for greedy approximate reverse-rank influence maximization. The design relies on our single-source algorithm. Our algorithms utilize near-linear preprocessing of the network to compute all-distance sketches. As a contribution of independent interest, we present a novel algorithm for computing these sketches, which have many other applications, on multi-core architectures. We complement our algorithms by establishing the hardness of computing {\em exact} reverse-ranks for a single source and {\em exact} reverse-rank influence. This implies that when using near-linear algorithms, the small relative errors we obtain are the best we can currently hope for. Finally, we conduct an experimental evaluation on graphs with tens of millions of edges, demonstrating both scalability and accuracy. View details
    Preview abstract Key value data sets of the form $\{(x,w_x)\}$ where $w_x >0$ are prevalent. Common queries over such data are {\em segment $f$-statistics} $Q(f,H) = \sum_{x\in H}f(w_x)$, specified for a segment $H$ of the keys and a function $f$. Different choices of $f$ correspond to count, sum, moments, cap, and threshold statistics. When the data set is large, we can compute a smaller sample from which we can quickly estimate statistics. A weighted sample of keys taken with respect to $f(w_x)$ provides estimates with statistically guaranteed quality for $f$-statistics. Such a sample $S^{(f)}$ can be used to estimate $g$-statistics for $g\not=f$, but quality degrades with the disparity between $g$ and $f$. In this paper we address applications that require quality estimates for a set $F$ of different functions. A naive solution is to compute and work with a different sample $S^{(f)}$ for each $f\in F$. Instead, this can be achieved more effectively and seamlessly using a single {\em multi-objective} sample $S^{(F)}$ of a much smaller size. We review multi-objective sampling schemes and place them in our context of estimating $f$-statistics. We show that a multi-objective sample for $F$ provides quality estimates for any $f$ that is a positive linear combination of functions from $F$. We then establish a surprising and powerful result when the target set $M$ is {\em all} monotone non-decreasing functions, noting that $M$ includes most natural statistics. We provide efficient multi-objective sampling algorithms for $M$ and show that a sample size of $k \ln n$ (where $n$ is the number of active keys) provides the same estimation quality, for any $f\in M$, as a dedicated weighted sample of size $k$ for $f$. View details
    Multi-Objective Weighted Sampling
    CoRR, vol. abs/1509.07445 (2015)
    Average Distance Queries through Weighted Samples in Graphs and Metric Spaces: High Scalability with Tight Statistical Guarantees
    Shiri Chechik
    Haim Kaplan
    CoRR, vol. abs/1503.08528 (2015)
    Stream Sampling for Frequency Cap Statistics
    KDD (2015), pp. 159-168
    Average Distance Queries through Weighted Samples in Graphs and Metric Spaces: High Scalability with Tight Statistical Guarantees
    Shiri Chechik
    Haim Kaplan
    APPROX-RANDOM (2015), pp. 659-679
    Min-Hash Sketches
    Encyclopedia of Algorithms (2015)
    Coordinated Sampling
    Encyclopedia of Algorithms (2015)
    All-Distances Sketches, Revisited: HIP Estimators for Massive Graphs Analysis
    IEEE Trans. Knowl. Data Eng., vol. 27 (2015), pp. 2320-2334
    Stream Sampling for Frequency Cap Statistics
    CoRR, vol. abs/1502.05955 (2015)
    Reverse Ranking by Graph Structure: Model and Scalable Algorithms
    Eliav Buchnik
    CoRR, vol. abs/1506.02386 (2015)
    All-Distances Sketches
    Encyclopedia of Algorithms (2015)
    Computing classic closeness centrality, at scale
    Daniel Delling
    Thomas Pajor
    Renato F. Werneck
    COSN (2014), pp. 37-50
    Sketch-based Influence Maximization and Computation: Scaling up with Guarantees
    Daniel Delling
    Thomas Pajor
    Renato F. Werneck
    CIKM (2014), pp. 629-638
    Probe scheduling for efficient detection of silent failures
    Haim Kaplan
    Yishay Mansour
    Yoav Tzur
    Perform. Eval., vol. 79 (2014), pp. 73-89
    Estimation for monotone sampling: competitiveness and customization
    PODC (2014), pp. 124-133
    Sketch-based Influence Maximization and Computation: Scaling up with Guarantees
    Daniel Delling
    Thomas Pajor
    Renato F. Werneck
    CoRR, vol. abs/1408.6282 (2014)
    Timed Influence: Computation and Maximization
    Daniel Delling
    Thomas Pajor
    Renato F. Werneck
    CoRR, vol. abs/1410.6976 (2014)
    All-distances sketches, revisited: HIP estimators for massive graphs analysis
    PODS (2014), pp. 88-99
    Variance Competitiveness for Monotone Estimation: Tightening the Bounds
    CoRR, vol. abs/1406.6490 (2014)
    Computing Classic Closeness Centrality, at Scale
    Daniel Delling
    Thomas Pajor
    Renato F. Werneck
    CoRR, vol. abs/1409.0035 (2014)
    Distance queries from sampled data: accurate and efficient
    KDD (2014), pp. 681-690
    Author retrospective for search and replication in unstructured peer-to-peer networks
    Qin Lv
    Pei Cao
    Kai Li
    Scott Shenker
    ICS 25th Anniversary (2014), pp. 64-82
    Algorithms and estimators for summarization of unaggregated data streams
    Nick G. Duffield
    Haim Kaplan
    Carsten Lund
    Mikkel Thorup
    J. Comput. Syst. Sci., vol. 80 (2014), pp. 1214-1244
    A Labeling Approach to Incremental Cycle Detection
    Amos Fiat
    Haim Kaplan
    Liam Roditty
    CoRR, vol. abs/1310.8381 (2013)
    Scalable Neighborhood Sketching and Distance Distribution Estimation in Graph Datasets: Revisited, Unified, and Improved
    CoRR, vol. abs/1306.3284 (2013)
    Scheduling Subset Tests: One-Time, Continuous, and How They Relate
    Haim Kaplan
    Yishay Mansour
    APPROX-RANDOM (2013), pp. 81-95
    Scalable similarity estimation in social networks: closeness, node labels, and random edge lengths
    Daniel Delling
    Fabian Fuchs
    Andrew V. Goldberg
    Moisés Goldszmidt
    Renato F. Werneck
    COSN (2013), pp. 131-142
    Probe Scheduling for Efficient Detection of Silent Failures
    Haim Kaplan
    Yishay Mansour
    Yoav Tzur
    CoRR, vol. abs/1302.0792 (2013)
    What You Can Do with Coordinated Samples
    Haim Kaplan
    APPROX-RANDOM (2013), pp. 452-467
    On the Tradeoff between Stability and Fit
    Graham Cormode
    Nick G. Duffield
    Carsten Lund
    CoRR, vol. abs/1302.2137 (2013)
    Envy-Free Makespan Approximation
    Michal Feldman
    Amos Fiat
    Haim Kaplan
    Svetlana Olonetsky
    SIAM J. Comput., vol. 41 (2012), pp. 12-25
    A Case for Customizing Estimators: Coordinated Samples
    Haim Kaplan
    CoRR, vol. abs/1212.0243 (2012)
    What you can do with Coordinated Samples
    Haim Kaplan
    CoRR, vol. abs/1206.5637 (2012)
    Don't let the negatives bring you down: sampling from streams of signed updates
    Graham Cormode
    Nick G. Duffield
    SIGMETRICS (2012), pp. 343-354
    How to Estimate Change from Samples
    Haim Kaplan
    CoRR, vol. abs/1203.4903 (2012)
    Structure-Aware Sampling: Flexible and Accurate Summarization
    Graham Cormode
    Nick G. Duffield
    CoRR, vol. abs/1102.5146 (2011)
    Truth, Envy, and Truthful Market Clearing Bundle Pricing
    Michal Feldman
    Amos Fiat
    Haim Kaplan
    Svetlana Olonetsky
    WINE (2011), pp. 97-108
    Get the most out of your sample: optimal unbiased estimators using partial information
    Haim Kaplan
    PODS (2011), pp. 13-24
    Structure-aware sampling on data streams
    Graham Cormode
    Nick G. Duffield
    SIGMETRICS (2011), pp. 197-208
    Structure-Aware Sampling: Flexible and Accurate Summarization
    Graham Cormode
    Nick G. Duffield
    PVLDB, vol. 4 (2011), pp. 819-830
    Efficient Stream Sampling for Variance-Optimal Estimation of Subset Sums
    Nick G. Duffield
    Haim Kaplan
    Carsten Lund
    Mikkel Thorup
    SIAM J. Comput., vol. 40 (2011), pp. 1402-1431
    Get the Most out of Your Sample: Optimal Unbiased Estimators using Partial Information
    Haim Kaplan
    CoRR, vol. abs/1109.1325 (2011)
    Truth and Envy in Capacitated Allocation Games
    Michal Feldman
    Amos Fiat
    Haim Kaplan
    Svetlana Olonetsky
    CoRR, vol. abs/1003.5326 (2010)
    Labeling Dynamic XML Trees
    Haim Kaplan
    Tova Milo
    SIAM J. Comput., vol. 39 (2010), pp. 2048-2074
    Envy-free makespan approximation: extended abstract
    Michal Feldman
    Amos Fiat
    Haim Kaplan
    Svetlana Olonetsky
    EC (2010), pp. 159-166
    On the Interplay between Incentive Compatibility and Envy Freeness
    Michal Feldman
    Amos Fiat
    Haim Kaplan
    Svetlana Olonetsky
    CoRR, vol. abs/1003.5328 (2010)
    Leveraging Discarded Samples for Tighter Estimation of Multiple-Set Aggregates
    Haim Kaplan
    CoRR, vol. abs/0903.0625 (2009)
    Coordinated Weighted Sampling for Estimating Aggregates Over Multiple Weight Assignments
    Haim Kaplan
    Subhabrata Sen
    PVLDB, vol. 2 (2009), pp. 646-657
    Leveraging discarded samples for tighter estimation of multiple-set aggregates
    Haim Kaplan
    SIGMETRICS/Performance (2009), pp. 251-262
    Stream sampling for variance-optimal estimation of subset sums
    Nick G. Duffield
    Haim Kaplan
    Carsten Lund
    Mikkel Thorup
    SODA (2009), pp. 1255-1264
    Envy-Free Makespan Approximation
    Michal Feldman
    Amos Fiat
    Haim Kaplan
    Svetlana Olonetsky
    CoRR, vol. abs/0909.1072 (2009)
    Composable, Scalable, and Accurate Weight Summarization of Unaggregated Data Sets
    Nick G. Duffield
    Haim Kaplan
    Carsten Lund
    Mikkel Thorup
    PVLDB, vol. 2 (2009), pp. 431-442
    Coordinated Weighted Sampling for Estimating Aggregates Over Multiple Weight Assignments
    Haim Kaplan
    Subhabrata Sen
    CoRR, vol. abs/0906.4560 (2009)
    Decay Models
    Encyclopedia of Database Systems (2009), pp. 757-761
    Sketch-Based Estimation of Subpopulation-Weight
    Haim Kaplan
    CoRR, vol. abs/0802.3448 (2008)
    Confident estimation for multistage measurement sampling and aggregation
    Nick G. Duffield
    Carsten Lund
    Mikkel Thorup
    SIGMETRICS (2008), pp. 109-120
    Estimating Aggregates over Multiple Sets
    Haim Kaplan
    ICDM (2008), pp. 761-766
    Processing top-k queries from samples
    Nadav Grossaug
    Haim Kaplan
    Computer Networks, vol. 52 (2008), pp. 2605-2622
    Tighter estimation using bottom k sketches
    Haim Kaplan
    PVLDB, vol. 1 (2008), pp. 213-224
    Variance optimal sampling based estimation of subset sums
    Nick G. Duffield
    Haim Kaplan
    Carsten Lund
    Mikkel Thorup
    CoRR, vol. abs/0803.0473 (2008)
    Sketching unaggregated data streams for subpopulation-size queries
    Nick G. Duffield
    Haim Kaplan
    Carsten Lund
    Mikkel Thorup
    PODS (2007), pp. 253-262
    Associative search in peer to peer networks: Harnessing latent semantics
    Amos Fiat
    Haim Kaplan
    Computer Networks, vol. 51 (2007), pp. 1861-1881
    Algorithms and estimators for accurate summarization of internet traffic
    Nick G. Duffield
    Haim Kaplan
    Carsten Lund
    Mikkel Thorup
    Internet Measurement Comference (2007), pp. 265-278
    Spatially-decaying aggregation over a network
    Haim Kaplan
    J. Comput. Syst. Sci., vol. 73 (2007), pp. 265-288
    Summarizing data using bottom-k sketches
    Haim Kaplan
    PODC (2007), pp. 225-234
    Bottom-k sketches: better and more efficient estimation of aggregates
    Haim Kaplan
    SIGMETRICS (2007), pp. 353-354
    Maintaining time-decaying stream aggregates
    Martin J. Strauss
    J. Algorithms, vol. 59 (2006), pp. 19-36
    Processing top k queries from samples
    Nadav Grossaug
    Haim Kaplan
    CoNEXT (2006), pp. 7
    A short walk in the Blogistan
    Balachander Krishnamurthy
    Computer Networks, vol. 50 (2006), pp. 615-630
    Making routing robust to changing traffic demands: algorithms and evaluation
    David Applegate
    IEEE/ACM Trans. Netw., vol. 16 (2006), pp. 1193-1206
    Performance aspects of distributed caches using TTL-based consistency
    Eran Halperin
    Haim Kaplan
    Theor. Comput. Sci., vol. 331 (2005), pp. 73-96
    Packet classification in large ISPs: design and evaluation of decision tree classifiers
    Carsten Lund
    SIGMETRICS (2005), pp. 73-84
    Guest Editors' foreword
    Venkatesan Guruswami
    J. Comput. Syst. Sci., vol. 68 (2004), pp. 701
    Balanced-Replication Algorithms for Distribution Trees
    Haim Kaplan
    SIAM J. Comput., vol. 34 (2004), pp. 227-247
    Spatially-decaying aggregation over a network: model and algorithms
    Haim Kaplan
    SIGMOD Conference (2004), pp. 707-718
    Optimal oblivious routing in polynomial time
    Yossi Azar
    Amos Fiat
    Haim Kaplan
    Harald Räcke
    J. Comput. Syst. Sci., vol. 69 (2004), pp. 383-394
    Efficient estimation algorithms for neighborhood variance and other moments
    Haim Kaplan
    SODA (2004), pp. 157-166
    Coping with network failures: routing strategies for optimal demand oblivious restoration
    David Applegate
    Lee Breslau
    SIGMETRICS (2004), pp. 270-281
    Efficient sequences of trials
    Amos Fiat
    Haim Kaplan
    SODA (2003), pp. 737-746
    Optimal oblivious routing in polynomial time
    Yossi Azar
    Amos Fiat
    Haim Kaplan
    Harald Räcke
    STOC (2003), pp. 383-388
    A case for associative peer to peer overlays
    Amos Fiat
    Haim Kaplan
    Computer Communication Review, vol. 33 (2003), pp. 95-100
    Reachability and Distance Queries via 2-Hop Labels
    Eran Halperin
    Haim Kaplan
    Uri Zwick
    SIAM J. Comput., vol. 32 (2003), pp. 1338-1355
    Associative Search in Peer to Peer Networks: Harnessing Latent Semantics
    Amos Fiat
    Haim Kaplan
    INFOCOM (2003)
    Making intra-domain routing robust to changing and uncertain traffic demands: understanding fundamental tradeoffs
    David Applegate
    SIGCOMM (2003), pp. 313-324
    Predicting and bypassing end-to-end Internet service degradations
    Anat Bremler-Barr
    Haim Kaplan
    Yishay Mansour
    IEEE Journal on Selected Areas in Communications, vol. 21 (2003), pp. 961-978
    Connection caching: model and algorithms
    Haim Kaplan
    Uri Zwick
    J. Comput. Syst. Sci., vol. 67 (2003), pp. 92-126
    Proactive caching of DNS records: addressing a performance bottleneck
    Haim Kaplan
    Computer Networks, vol. 41 (2003), pp. 707-726
    Maintaining time-decaying stream aggregates
    Martin Strauss
    PODS (2003), pp. 223-233
    Reachability and distance queries via 2-hop labels
    Eran Halperin
    Haim Kaplan
    Uri Zwick
    SODA (2002), pp. 937-946
    Search and replication in unstructured peer-to-peer networks
    Qin Lv
    Pei Cao
    Kai Li
    Scott Shenker
    ICS (2002), pp. 84-95
    Balanced-Replication Algorithms for Distribution Trees
    Haim Kaplan
    ESA (2002), pp. 297-309
    Caching Documents with Variable Sizes and Fetching Costs: An LP-Based Approach
    Haim Kaplan
    Algorithmica, vol. 32 (2002), pp. 459-466
    Exploiting Regularities in Web Traffic Patterns for Cache Replacement
    Haim Kaplan
    Algorithmica, vol. 33 (2002), pp. 300-334
    Search and replication in unstructured peer-to-peer networks
    Qin Lv
    Pei Cao
    Kai Li
    Scott Shenker
    SIGMETRICS (2002), pp. 258-259
    Restoration by path concatenation: fast recovery of MPLS paths
    Yehuda Afek
    Anat Bremler-Barr
    Haim Kaplan
    Michael Merritt
    Distributed Computing, vol. 15 (2002), pp. 273-283
    Prefetching the means for document transfer: a new approach for reducing Web latency
    Haim Kaplan
    Computer Networks, vol. 39 (2002), pp. 437-455
    Replication strategies in unstructured peer-to-peer networks
    Scott Shenker
    SIGCOMM (2002), pp. 177-190
    Predicting and bypassing end-to-end internet service degradations
    Anat Bremler-Barr
    Haim Kaplan
    Yishay Mansour
    Internet Measurement Workshop (2002), pp. 307-320
    Refreshment policies for Web content caches
    Haim Kaplan
    Computer Networks, vol. 38 (2002), pp. 795-808
    Labeling Dynamic XML Trees
    Haim Kaplan
    Tova Milo
    PODS (2002), pp. 271-281
    Competitive Analysis of the LRFU Paging Algorithm
    Haim Kaplan
    Uri Zwick
    Algorithmica, vol. 33 (2002), pp. 511-516
    Restoration by path concatenation: fast recovery of MPLS paths
    Anat Bremler-Barr
    Yehuda Afek
    Haim Kaplan
    Michael Merritt
    PODC (2001), pp. 43-52
    Restoration path concatenation: fast recovery of MPLS paths
    Anat Bremler-Barr
    Yehuda Afek
    Haim Kaplan
    Michael Merritt
    SIGMETRICS/Performance (2001), pp. 316-317
    Proactive Caching of DNS Records: Addressing a Performance Bottleneck
    Haim Kaplan
    SAINT (2001), pp. 85-94
    Refreshment Policies for Web Content Caches
    Haim Kaplan
    INFOCOM (2001), pp. 1398-1406
    Performance Aspects of Distributed Caches Using TTL-Based Consistency
    Eran Halperin
    Haim Kaplan
    ICALP (2001), pp. 744-756
    All-Pairs Small-Stretch Paths
    Uri Zwick
    J. Algorithms, vol. 38 (2001), pp. 335-353
    The Age Penalty and Its Effect on Cache Performance
    Haim Kaplan
    USITS (2001), pp. 73-84
    Competitive Analysis of the LRFU Paging Algorithm
    Haim Kaplan
    Uri Zwick
    WADS (2001), pp. 148-154
    Finding Interesting Associations without Support Pruning
    Mayur Datar
    Shinji Fujiwara
    Aristides Gionis
    Piotr Indyk
    Rajeev Motwani
    Jeffrey D. Ullman
    Cheng Yang
    IEEE Trans. Knowl. Data Eng., vol. 13 (2001), pp. 64-78
    Aging through cascaded caches: performance issues in the distribution of web content
    Haim Kaplan
    SIGCOMM (2001), pp. 41-53
    Connection caching under vaious models of communication
    Haim Kaplan
    Uri Zwick
    SPAA (2000), pp. 54-63
    Finding Interesting Associations without Support Pruning
    Mayur Datar
    Shinji Fujiwara
    Aristides Gionis
    Piotr Indyk
    Rajeev Motwani
    Jeffrey D. Ullman
    Cheng Yang
    ICDE (2000), pp. 489-499
    Polylog-time and near-linear work approximation scheme for undirected shortest paths
    J. ACM, vol. 47 (2000), pp. 132-166
    Prefetching the Means for Document Transfer: A New Approach for Reducing Web Latency
    Haim Kaplan
    INFOCOM (2000), pp. 854-863
    Connection Caching
    Haim Kaplan
    Uri Zwick
    STOC (1999), pp. 612-621
    Approximating Matrix Multiplication for Pattern Recognition Tasks
    David D. Lewis
    J. Algorithms, vol. 30 (1999), pp. 211-252
    Efficient Algorithms for Predicting Requests to Web Servers
    Balachander Krishnamurthy
    Jennifer Rexford
    INFOCOM (1999), pp. 284-293
    Exploiting Regularities in Web Traffic Patterns for Cache Replacement
    Haim Kaplan
    STOC (1999), pp. 109-118
    Managing TCP Connections Under Persistent HTTP
    Haim Kaplan
    Computer Networks, vol. 31 (1999), pp. 1709-1723
    LP-based Analysis of Greedy-dual-size
    Haim Kaplan
    SODA (1999), pp. 879-880
    Structure Prediction and Computation of Sparse Matrix Products
    J. Comb. Optim., vol. 2 (1998), pp. 307-332
    Improving End-to-End Performance of the Web Using Server Volumes and Proxy Filters
    Balachander Krishnamurthy
    Jennifer Rexford
    SIGCOMM (1998), pp. 241-253
    Fast Algorithms for Constructing t-Spanners and Paths with Stretch t
    SIAM J. Comput., vol. 28 (1998), pp. 210-236
    Evaluating Server-Assisted Cache Replacement in the Web
    Balachander Krishnamurthy
    Jennifer Rexford
    ESA (1998), pp. 307-319
    Using Selective Path-Doubling for Parallel Shortest-Path Computations
    J. Algorithms, vol. 22 (1997), pp. 30-56
    Size-Estimation Framework with Applications to Transitive Closure and Reachability
    J. Comput. Syst. Sci., vol. 55 (1997), pp. 441-453
    Approximating Matrix Multiplication for Pattern Recognition Tasks
    David D. Lewis
    SODA (1997), pp. 682-691
    Learning Noisy Perceptrons by a Perceptron in Polynomial Time
    FOCS (1997), pp. 514-523
    All-Pairs Small-Stretch Paths
    Uri Zwick
    SODA (1997), pp. 93-102
    On Optimizing Multiplications of Sparse Matrices
    IPCO (1996), pp. 219-233
    Efficient Parallel Shortest-Paths in Digraphs with a Separator Decomposition
    J. Algorithms, vol. 21 (1996), pp. 331-357
    Approximate Max-Flow on Small Depth Networks
    SIAM J. Comput., vol. 24 (1995), pp. 579-597
    Improvise: Interactive Multimedia Process Visualization Environment
    Naser S. Barghouti
    Eleftherios Koutsofios
    ESEC (1995), pp. 28-43
    Estimating the Size of the Transitive Closure in Linear Time
    FOCS (1994), pp. 190-200
    Improved Algorithms for Linear Inequalities With Two Variables per Inequality
    Nimrod Megiddo
    SIAM J. Comput., vol. 23 (1994), pp. 1313-1347
    Polylog-time and near-linear work approximation scheme for undirected shortest paths
    STOC (1994), pp. 16-26
    Algorithms and Complexity Analysis for Some Flow Problems
    Nimrod Megiddo
    Algorithmica, vol. 11 (1994), pp. 320-340
    New algorithms for generalized network flows
    Nimrod Megiddo
    Math. Program., vol. 64 (1994), pp. 325-336
    Fast algorithms for constructing t-spanners and paths with stretch t
    FOCS (1993), pp. 648-658
    Strongly Polynomial-Time and NC Algorithms for Detecting Cycles in Periodic Graphs
    Nimrod Megiddo
    J. ACM, vol. 40 (1993), pp. 791-830
    Using Selective Path-Doubling for Parallel Shortest-Path Computations
    ISTCS (1993), pp. 78-87
    Efficient Parallel Shortest-Paths in Digraphs with a Separator Decomposition
    SPAA (1993), pp. 57-67
    Approximate Max Flow on Small Depth Networks
    FOCS (1992), pp. 648-658
    New Algorithms for Generalized Network Flows
    Nimrod Megiddo
    ISTCS (1992), pp. 103-114
    Algorithms and Complexity Analysis for Some Flow Problems
    Nimrod Megiddo
    SODA (1991), pp. 120-130
    NP-Completeness of graph decomposition problems
    Michael Tarsi
    J. Complexity, vol. 7 (1991), pp. 200-212
    Improved Algorithms for Linear Inequalities with Two Variables per Inequality (Extended Abstract)
    Nimrod Megiddo
    STOC (1991), pp. 145-155
    Strongly Polynomial-Time and NC Algorithms for Detecting Cycles in Dynamic Graphs (Preliminary Version)
    Nimrod Megiddo
    STOC (1989), pp. 523-534