Google Research

Fast and Accurate k-means++ via Rejection Sampling

  • Ashkan Norouzi Fard
  • Christian Sohler
  • Ola Svensson
  • Silvio Lattanzi
  • Vincent Pierre Cohen-addad
NeurIPS 2020

Abstract

We present the first distributed approximation algorithm for the Euclidean $k$-median problem with an optimal trade-off between memory usage and the number of parallel rounds. Our algorithm even works in the setting where each machine has very limited memory $s\in \Omega(\log n)$ and it is work efficient. In the future, it would be interesting to obtain similar results for other clustering problems and to improve the approximation factor of our algorithm.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work