Google Research

Reliable Distributed Clustering with Redundant Data Assignment

ICML Workshop on Coding Theory for Large-scale Machine Learning (2019)

Abstract

In this work we present distributed generalized clustering algorithms (with k-means and PCA as special cases) that can handle large scale data across multiple machines in spite of straggling or unreliable machines. We propose a novel data assignment scheme that enables us to obtain global information about data even when some machines fail to respond. The assignment scheme leads to distributed algorithms with good approximation guarantees for a variety of clustering and dimensionality reduction problems.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work