Affinity Clustering: Hierarchical Clustering at Scale

MohammadHossein Bateni; Soheil Behnezhad; Mahsa Derakhshan; MohammadTaghi Hajiaghayi; Raimondas Kiveris; Silvio Lattanzi; Vahab Mirrokni

Affinity Clustering: Hierarchical Clustering at Scale

MohammadHossein Bateni

Soheil Behnezhad

Mahsa Derakhshan

MohammadTaghi Hajiaghayi

Raimondas Kiveris

Silvio Lattanzi

Vahab Mirrokni

NIPS 2017, pp. 6867-6877

Download Google Scholar

Abstract

Graph clustering is a fundamental task in many data-mining and machine-learning pipelines. In particular, identifying good hierarchical clustering structure is at the same time a fundamental and challenging problem for several applications. In many applications, the amount of data to analyze is increasing at an astonishing rate each day. Hence there is a need for new solutions to efficiently compute effective hierarchical clusterings on such huge data.

In this paper, we propose algorithms to address this problem. First, we analyze minimum spanning tree-based clustering algorithms and their corresponding hierarchical clusterings. In particular we consider classic single-linkage clustering based on Kruskal's algorithm and a variation of Boruvka algorithm that we call affinity clustering and prove new interesting properties of these clusterings via the concept of certificates. Then we present new algorithms in the MapReduce model and their efficient real world implementations via Distributed Hash Tables (DHTs). Our MapReduce algorithms indeed improve upon the previous MapReduce algorithms for finding a minimum spanning tree in graphs as well. Finally we show experimentally that our algorithms are scalable for huge data and competitive with state-of-the-art algorithms. In particular we show that Affinity Clustering is in practice superior to several state-of-the-art clustering algorithms.

Research Areas

Algorithms and theory

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Affinity Clustering: Hierarchical Clustering at Scale

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs