# Vincent Pierre Cohen-addad

### Research Areas

Authored Publications

Sort By

Streaming Euclidean MST to a Constant Factor

Amit Levi

Erik Waingarten

Xi Chen

54rd Annual ACM Symposium on Theory of Computing (STOC'23) (2023)

Preview abstract
We study streaming algorithms for the fundamental geometric problem of computing the cost of the Euclidean Minimum Spanning Tree (MST) on an $n$-point set $X \subset \R^d$. In the streaming model, the points in $X$ can be added and removed arbitrarily, and the goal is to maintain an approximation in small space. In low dimensions, $(1+\epsilon)$ approximations are possible in sublinear space. However, for high dimensional space the best known approximation for this problem was $\tilde{O}(\log n)$, due to [Chen, Jayaram, Levi, Waingarten, STOC'22], improving on the prior $O(\log^2 n)$ bound due to [Andoni, Indyk, Krauthgamer, SODA '08]. In this paper, we break the logarithmic barrier, and give the first constant factor sublinear space approximation to Euclidean MST. For any $\epsilon\geq 1$, our algorithm achieves an $\tilde{O}(\epsilon^{-2})$ approximation in $n^{O(\epsilon)} d^{O(1)}$ space.
We complement this by demonstrating that any single pass algorithm which obtains a better than $1.10$-approximation must use $\Omega(\sqrt{n})$ space, demonstrating that $(1+\epsilon)$ approximations are not possible in high-dimensions, and that our algorithm is tight up to a constant. Nevertheless, we demonstrate that $(1+\epsilon)$ approximations are possible in sublinear space with $O(1/\epsilon)$ passes over the stream. More generally, for any $\alpha \geq 2$, we give a $\alpha$-pass streaming algorithm which achieves a $O(\frac{1}{ \alpha \epsilon})$ approximation in $n^{O(\epsilon)} d^{O(1)}$ space.
All our streaming algorithms are linear sketches, and therefore extend to the massively-parallel computation model (MPC). Thus, our results imply the first $(1+\epsilon)$-approximation in a constant number of rounds in the MPC model. Previously, such a result was only known for low-dimensional space ([Andoni, Nikolov, Onak, Yaroslavtsev, STOC'15]), or either required $O(\log n)$ rounds or suffered a $O(\log n)$ approximation.
View details

Streaming Euclidean k-median and k-means to a (1 + ε)-approximation with o_{k,ε}(log n) Memory Words

Preview abstract
We consider the classic Euclidean k-median and k-means objective on insertion-only streams,
where the goal is to maintain a (1 + ε)-approximation to the k-median or k-means, while using
as little memory as possible. Over the last 20 years, clustering in data streams has received a
tremendous amount of attention and has been the test-bed for a large variety of new techniques,
including coresets, merge-and-reduce, bicriteria approximation, sensitivity sampling, etc. Despite
this intense effort to obtain smaller and smaller sketches for these problems, all known techniques
require storing at least Ω(log (n∆)) words of memory, where n is size of the input and ∆ is
the aspect ratio. In this paper, we show how to finally bypass this bound and obtain the first
algorithm that achieves a (1 + ε)-approximation to the more general (k, z)-clustering problem,
using only ̃O ( dk / ε^2) · (2^{z log z} ) · min ( 1/ε^z , k) · poly(log log(n∆)) words of memory.
View details

Private estimation algorithms for stochastic block models and mixture models

Hongjie Chen

Tommaso D'Orsi

Jacob Imola

David Steurer

Stefan Tiegel

54rd Annual ACM Symposium on Theory of Computing (STOC'23) (2023)

Preview abstract
We introduce general tools for designing efficient private estimation algorithms, in the high-dimensional settings, whose statistical guarantees almost match those of the best known non-private algorithms. To illustrate our techniques, we consider two problems: recovery of stochastic block models and learning mixtures of spherical Gaussians. For the former, we present the first efficient (ϵ,δ)-differentially private algorithm for both weak recovery and exact recovery. Previously known algorithms achieving comparable guarantees required quasi-polynomial time. For the latter, we design an (ϵ,δ)-differentially private algorithm that recovers the centers of the k-mixture when the minimum separation is at least O(k^{1/t}/√t). For all choices of t, this algorithm requires sample complexity n≥k^O(1)d^O(t) and time complexity (nd)^O(t). Prior work required minimum separation at least O(√k) as well as an explicit upper bound on the Euclidean norm of the centers.
View details

Breaching the 2 LMP Approximation Barrier for Facility Location with Applications to k-Median

Chris Schwiegelshohn

Euiwoong Lee

Fabrizio Grandoni

SODA'23 (2023) (to appear)

Preview abstract
The $k$-Median problem is one the most fundamental clustering problems: Given $n$ points in a metric space and an integer parameter $k$ our goal is to select precisely $k$ points (called centers) so as to minimize the sum of the distances from each point to the closest center. Obtaining better and better approximation algorithms for $k$-Median
is a central open problem.
Over the years, two main approaches have emerged: the first one is the standard Local Search heuristic, yielding a $(3+\eps)$ approximation for any constant $\eps>0$ which is known to be tight.
The second approach is based on a Lagrangian Multiplier
Preserving (LMP) $\alpha$ approximation algorithm for the related facility location problem. This algorithm is used to build an $\alpha$ approximate fractional bipoint solution for $k$-Median, which is then rounded via a $\rho$ approximate bipoint rounding algorithm. Altogether this gives an $\alpha\cdot \rho$ approximation. A lot of progress was made on improving $\rho$, from the initial $2$ by Jain and Vazirani [FOCS'99, J.ACM'01], to $1.367$ by Li and Svensson [STOC'13, SICOMP'16], and finally to $1.338$ by Byrka et al. [SODA'15, TALG'17]. However for almost 20 years no progress was made on $\alpha$, where the current best result is the classical LMP $2$ approximation algorithm JMS for facility location by Jain et al. [STOC'02, \fab{J.ACM'03}] based on the Dual-Fitting technique.
View details

An Improved Local Search Algorithm for k-Median

Anupam Gupta

David Saulpic

Hoon Oh

Lunjia Hu

Symposium On Discrete Algorithms SODA'22 (2022)

Preview abstract
We present a new local-search algorithm for the k-median clustering problem.
We show that local optima for this algorithm give an
(2.836+eps)-approximation; our result improves upon the (3+eps)-approximate
local-search algorithm of Arya et al. (2001). Moreover, a computer-aided
analysis of a natural extension suggests that this approach may lead to a
improvement over the best-known approximation guarantee for the problem (which
is 2.67).
The new ingredient in our algorithm is the use of a potential function based
on both the closest and second-closest facilities to each client.
Specifically, the potential is the sum over all clients, of the distance of
the client to its closest facility, plus (a small constant times) the
truncated distance to its second-closest facility. We move from one solution
to another only if the latter can be obtained by swapping a constant number of
facilities, and has a smaller potential than the former. This refined
potential allows us to avoid the bad local optima given by Arya et al. for the
local-search algorithm based only on the cost of the solution.
View details