Fast and Accurate k-means++ via Rejection Sampling
Abstract
We present the first distributed approximation algorithm for the Euclidean $k$-median problem with an
optimal trade-off between memory usage and the number of parallel rounds. Our algorithm even works in the setting where each machine has very limited memory $s\in \Omega(\log n)$ and it is work efficient. In the future, it would be interesting to obtain similar results for other clustering problems and to improve the approximation factor of our algorithm.
optimal trade-off between memory usage and the number of parallel rounds. Our algorithm even works in the setting where each machine has very limited memory $s\in \Omega(\log n)$ and it is work efficient. In the future, it would be interesting to obtain similar results for other clustering problems and to improve the approximation factor of our algorithm.