Jump to Content

Fast and Accurate k-means++ via Rejection Sampling

Ashkan Norouzi Fard
Christian Sohler
Ola Svensson
NeurIPS 2020
Google Scholar


We present the first distributed approximation algorithm for the Euclidean $k$-median problem with an optimal trade-off between memory usage and the number of parallel rounds. Our algorithm even works in the setting where each machine has very limited memory $s\in \Omega(\log n)$ and it is work efficient. In the future, it would be interesting to obtain similar results for other clustering problems and to improve the approximation factor of our algorithm.