Jump to Content

Community Detection in the Heterogeneous Stochastic Block Model

David Saulpic
Frederik Mallmann-Trenn
35th Annual Conference on Learning Theory (COLT 2022) (2022)
Google Scholar


We consider the Heterogeneous Stochastic Block Model (HeSBM) consisting of two unknown planted communities C1 and C2 . Each node u has two (unknown) parameters pu and qu with pu > qu , describing the probability of having a directed edges with nodes from the same com- munity and with nodes from the other commu- nity, respectively. This model allows for almost arbitrary degree distributions. We present a simple, linear time algorithm that recovers the communities whenever (pu −qu)/sqrt(pu) = Ω(log n/n). We also show that this condition is necessary. Note that this recovery threshold is the same as from the standard Stochastic Block Model, in which all nodes have the same pu and qu . Our algorithm is based on local greedy improvement, where nodes are moved into the part where it has more edges. We show how to implement those moves in parallel, such that each vertex needs to be considered at most twice. We believe that this algorithm can be easily implemented in parallel, making it highly practical; and the theoretical guarantee we demonstrate show a desirable robustness to perturbations in the degree distribution