Google Research

Parallel Spectral Clustering

  • Yangqiu Song
  • Wen-Yen Chen
  • Hongjie Bai
  • Chih-Jen Lin
  • Edward Chang
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), Springer (2008), pp. 374-389

Abstract

Spectral clustering algorithm has been shown to be more e ective in nding clusters than most traditional algorithms. However, spectral clustering su ers from a scalability problem in both memory use and computational time when a dataset size is large. To perform clustering on large datasets, we propose to parallelize both memory use and computation on distributed computers. Through an empirical study on a large document dataset of 193,844 data instances and a large photo dataset of 637,137, we demonstrate that our parallel algorithm can effectively alleviate the scalability problem.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work