Extreme k-Center Clustering

Google Scholar

Abstract

Metric clustering is a fundamental primitive in machine learning with several applications for mining massive data-sets. An important example of metric clustering is the $k$-center problem. While this problem has been extensively studied in distributed settings, all previous algorithms require $\Omega(k)$ space per machine and $\Omega(n k)$ total work.

In this paper, we develop the first highly scalable approximation algorithm for $k$-center clustering requiring $o(k)$ space per machine with $o(n k)$ total work. In particular, our algorithm needs $\widetilde{O}(n^{\eps})$ space per machine and $\tilde{O}(n^{1+\epsilon})$ total work, and computes an $O(\log \log \log n)$-approximation of the problem by selecting $(1+o(1))k$ centers in $O(\log \log n)$ rounds. This is achieved by introducing core-sets of truly sublinear size.