Flowless: Extracting Densest Subgraphs Without Flow Computations
Abstract
The problem of finding dense components of a graph is a major primitive in graph mining and data analysis. The densest subgraph problem (DSP) that asks to find a subgraph with maximum average degree forms a basic primitive in dense subgraph discovery with applications ranging from community detection to unsupervised discovery of biological network modules. The DSP is exactly solvable in polynomial time using maximum flows. Due to the high computational cost of maximum flows, Charikar's greedy approximation algorithm is usually preferred in practice due to its linear time and linear space complexity. It constitutes a key algorithmic idea in scalable solutions for large-scale dynamic graphs. However, its output density can be a factor 2 off the optimal solution.
In this paper we design Greedy++, an iterative peeling algorithm that improves upon the performance of Charikar's greedy algorithm significantly. Our iterative greedy algorithm is able to output near-optimal and optimal solutions fast by adding a few more passes to Charikar's greedy algorithm. Furthermore Greedy++ is more robust against the structural heterogeneities (e.g., skewed degree distributions) in real-world datasets. An additional property of our algorithm is that it is able to assess quickly, without computing maximum flows, whether Charikar's approximation quality on a given graph instance is closer to the worst case theoretical guarantee of 1/2 or to optimality. We also demonstrate that our method has significant efficiency advantage over the maximum flow based exact optimal algorithm. For example, our algorithm achieves ~145x speedup on average across a variety of real-world graphs while finding subgraphs of density that are at least 90% as dense as the densest subgraph.
In this paper we design Greedy++, an iterative peeling algorithm that improves upon the performance of Charikar's greedy algorithm significantly. Our iterative greedy algorithm is able to output near-optimal and optimal solutions fast by adding a few more passes to Charikar's greedy algorithm. Furthermore Greedy++ is more robust against the structural heterogeneities (e.g., skewed degree distributions) in real-world datasets. An additional property of our algorithm is that it is able to assess quickly, without computing maximum flows, whether Charikar's approximation quality on a given graph instance is closer to the worst case theoretical guarantee of 1/2 or to optimality. We also demonstrate that our method has significant efficiency advantage over the maximum flow based exact optimal algorithm. For example, our algorithm achieves ~145x speedup on average across a variety of real-world graphs while finding subgraphs of density that are at least 90% as dense as the densest subgraph.