Google Research

Hierarchical Clustering of Data Streams: Scalable Algorithms and Approximation Guarantees

  • Anand Rajagopalan
  • Claudio Gentile
  • Danny Vainstein
  • Fabio Vitale
  • Gui Citovsky
  • Magda Procopiuc
ICML (2021)


We investigate the problem of hierarchically clustering data streams containing metric data in R^d. We introduce a desirable invariance property for such algorithms, describe a general family of hyperplane-based methods enjoying this property, and analyze two scalable instances of this general family against recently popularized similarity/dissimilarity-based metrics for hierarchical clustering. We prove a number of new results related to the approximation ratios of these algorithms, improving in various ways over the literature on this subject. Finally, since our algorithms are principled but also very practical, we carry out an experimental comparison on both synthetic and real-world datasets showing competitive results against known baselines.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work