p-Norm Flow Diffusion for Local Graph Clustering
Abstract
Local graph clustering and the closely related seed set expansion problem are primitives on graphs that are central to a wide range of analytic and learning tasks such as local clustering, community detection, nodes ranking and feature inference. Prior work on local graph clustering mostly falls into two categories with numerical and combinatorial roots respectively.
In this work, we draw inspiration from both fields and propose a family of convex optimization formulations based on the idea of diffusion with $p$-norm network flow for $p\in (1,\infty)$.
In the context of local clustering, we characterize the optimal solutions for these optimization problems and show their usefulness in finding low conductance cuts around input seed set. In particular, we achieve quadratic approximation of conductance in the case of $p=2$ similar to the Cheeger-type bounds of spectral methods, constant factor approximation when $p\rightarrow\infty$ similar to max-flow based methods, and a smooth transition for general $p$ values in between. Thus, our optimization formulation can be viewed as bridging the numerical and combinatorial approaches, and we can achieve the best of both worlds in terms of speed and noise robustness.
We show that the proposed problem can be solved in strongly local running time for $p\ge 2$ and conduct empirical evaluations on both synthetic and real-world graphs to illustrate our approach compares favorably with existing methods.
In this work, we draw inspiration from both fields and propose a family of convex optimization formulations based on the idea of diffusion with $p$-norm network flow for $p\in (1,\infty)$.
In the context of local clustering, we characterize the optimal solutions for these optimization problems and show their usefulness in finding low conductance cuts around input seed set. In particular, we achieve quadratic approximation of conductance in the case of $p=2$ similar to the Cheeger-type bounds of spectral methods, constant factor approximation when $p\rightarrow\infty$ similar to max-flow based methods, and a smooth transition for general $p$ values in between. Thus, our optimization formulation can be viewed as bridging the numerical and combinatorial approaches, and we can achieve the best of both worlds in terms of speed and noise robustness.
We show that the proposed problem can be solved in strongly local running time for $p\ge 2$ and conduct empirical evaluations on both synthetic and real-world graphs to illustrate our approach compares favorably with existing methods.