Connected Components in MapReduce and Beyond

Raimondas Kiveris; Silvio Lattanzi; Vahab Mirrokni; Vibhor Rastogi; Sergei Vassilvitskii

Connected Components in MapReduce and Beyond

Raimondas Kiveris

Silvio Lattanzi

Vahab Mirrokni

Vibhor Rastogi

Sergei Vassilvitskii

SOCC 2014

Download Google Scholar

Abstract

Computing connected components of a graph lies at the core of many data mining algorithms, and is a fundamental subroutine in graph clustering. This problem is well studied, yet many of the algorithms with good theoretical guarantees perform poorly in practice, especially when faced with graphs with hundreds of billions of edges. In this paper, we design improved algorithms based on traditional MapReduce architecture for large scale data analysis. We also explore the effect of augmenting MapReduce with a distributed hash table (DHT) service. We show that these algorithms have provable theoretical guarantees, and easily outperform previously studied algorithms, sometimes by more than an order of magnitude. In particular, our iterative MapReduce algorithms run 3 to 15 times faster than the best previously studied algorithms, and the MapReduce implementation using a DHT is 10 to 30 times faster than the best previously studied algorithms. These are the fastest algorithms that easily scale to graphs with hundreds of billions of edges.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Connected Components in MapReduce and Beyond

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs