Scaling Graph Neural Networks with Approximate PageRank
Graph neural networks (GNNs) have emerged as a powerful approach for solving many network mining tasks. However, despite their successes on small datasets, efficiently utilizing them on massive web-scale data remains a challenge. All recently proposed scalable GNN approaches rely on a message passing procedure to propagate information on the graph, leading to expensive recursive neighborhood expansion (and aggregation) schemes during both training and inference. This limitation is particularly problematic if we want to consider neighbors that are multiple hops away. In contrast, by leveraging connections between GNNs and personalized PageRank, we develop a model that incorporates multi-hop neighborhood information in a single (non-recursive) step. Our model \model, is significantly faster than previous scalable approaches while maintaining state-of-the-art prediction performance. Moreover, our algorithm can produce a scalability certificate which guarantees that the predictions will not change if we would have used instead a more expensive non-scalable baseline. To demonstrate the strengths and the scalability of our approach we both evaluate on existing datasets, and propose a new large scale graph learning setting, using the open academic graph (90M nodes, 3B edges). Additionally, we discuss the practical applications of large-scale semi-supervised learning, like \model~ at Google to solve node classification problems.