Google Research

Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces

Neurips'21 (2021)

Abstract

We study in this paper the geometric $(1, z)$-clustering problem: given $n$ points in $\R^d$, find the point $x$ that minimizes the sum of Euclidean distance, raised to the power $z$, over all input points. This problem interpolates between the well-known Fermat-Weber problem -- or geometric median problem-- where $z = 1$, and the Minimum Enclosing Ball problem, where $z = \infty$.

Our contribution is the design of a precise estimator that sample only a constant number of points. Namely, for any $\eps > 0$, we show that sampling uniformly at random $O(\eps^{-z})$ input points is enough to find a center such that the sum of distances to the power $z$ to that center is within a $(1+\eps)$-factor of the optimum. We also provide a lower bound, showing that any such algorithm must sample at least $\Omega\left(eps^{-z}\right)$ points.

This implies an algorithm that computes a $(1+\eps)$-approximation running in time $0(d \eps^{-z-5})$, generalizing the result from Cohen et al [STOC '16] to arbitrary $z$. Furthermore, an algorithm with low query complexity has good privacy guarantee: we show that our algorithm is $(0, O(1/n))$-differentially private. This can be used to construct the first differentially-private algorithm for $(k, z)$-clustering with approximation term independent of the dimension $d$, improving on the algorithm of Ghazi et al. [Neurips '20] that has an additive error $\sqrt d \cdot \poly(k, \log n)$.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work