Towards Optimal Coreset Bounds for Euclidean k-Median and k-Means

Chris Schwiegelshohn

David Saulpic

Kasper Green Larsen

Vincent Pierre Cohen-addad

54rd Annual ACM Symposium on Theory of Computing (STOC'22) (2022)

Google Scholar

Abstract

The classic Euclidean $k$-Median and $k$-means problems consists of finding a set of $k$ points, such that the sum of Euclidean distances, respectively squared Euclidean distances, of every data point to its closest center is minimized. Coresets are a small, usually weighted subset of the input point set preserving the cost of any candidate solution up to a multiplicative $(1\pm \varepsilon)$ factor. Most coreset constructions focus on improving the size, i.e. the number of distinct points in the coreset. The arguably biggest open question in this line of research is to find tight bounds for Euclidean $k$-median and $k$-means. In this paper, we show the following.
\begin{itemize}
\item A coreset construction consisting of $O(k\cdot \log k \cdot \varepsilon^{-3}\cdot \log^{O(1)}\varepsilon^{-1})$ points for the k-median problem in arbitrary dimensions.

\item An unconditional lower bound showing that $\Omega(k\cdot \varepsilon^{-2})$ points are necessary for any coreset for either problem.

\item An unconditional lower bound showing that $\Omega(k \cdot \varepsilon^{-2} \cdot \log n}
points are necessary for any coreset of either problem in general metric spacesé
\end{itemize}
The two bounds for Euclidean space are tight up to a factor (epsilon^{-1} polylog (k/epsilon). In addition, our techniques improve state of the art bounds for clustering with higher powers of Euclidean distances as well.
The bound for general metrics in tight up to a factor polylog (k/epsilon).

Research Areas

Algorithms and Theory

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Towards Optimal Coreset Bounds for Euclidean k-Median and k-Means

Abstract

Research Areas

Meet the teams driving innovation