Jump to Content
Bryan Perozzi

Bryan Perozzi

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Machine Learning on Graphs: A Model and Comprehensive Taxonomy
    Ines Chami
    Sami Abu-El-Haija
    Chris Ré
    Journal of Machine Learning Research, vol. 23 (2022), pp. 1-64
    Preview abstract There has been a surge of recent interest in graph representation learning (GRL). GRL methods have generally fallen into three main categories, based on the availability of labeled data. The first, network embedding, focuses on learning unsupervised representations of relational structure. The second, graph regularized neural networks, leverages graphs to augment neural network losses with a regularization objective for semi-supervised learning. The third, graph neural networks, aims to learn differentiable functions over discrete topologies with arbitrary structure. However, despite the popularity of these areas there has been surprisingly little work on unifying the three paradigms. Here, we aim to bridge the gap between network embedding, graph regularization and graph neural networks. We propose a comprehensive taxonomy of GRL methods, aiming to unify several disparate bodies of work. Specifically, we propose the GraphEDM framework, which generalizes popular algorithms for semi-supervised learning (e.g. GraphSage, GCN, GAT), and unsupervised learning (e.g. DeepWalk, node2vec) of graph representations into a single consistent approach. To illustrate the generality of GraphEDM, we fit over thirty existing methods into this framework. We believe that this unifying view both provides a solid foundation for understanding the intuition behind these methods, and enables future research in the area. View details
    GraphWorld: Fake Graphs Bring Real Insights for GNNs
    Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2022)
    Preview abstract The continuing maturity of the deep learning subfield of graph neural networks (GNNs) has motivated recent studies into the standard datasets used to benchmark GNNs. While important improvements have been made to GNN datasets and experimental design, any one dataset provides only a singular, potentially spurious insight into the performance of any GNN being tested. We show that state-of-the-art GNN task datasets do not cover the distribution of graphs in a much larger real-data graph repository, with respect to several key graph metrics. Motivated by this finding, we introduce GraphWorld, a novel distributed framework and software package for testing GNN models on an arbitrarily-large population of \emph{synthetic} task datasets. GraphWorld allows a user to efficiently generate a \emph{world} of millions of graph datasets, with fine-grained control over graph generator parameters, and benchmark arbitrary GNN models, with built-in hyperparameter tuning. Using GraphWorld to generate diverse graph worlds corresponding to node classification, graph classification, and link prediction tasks, we provide insight into the sensitivity of 10,000+ GNN models to various parameters of graphs and node features and} show comparisons between models that have not been possible to make in any previous work. We also introduce a novel metric with which to explore each models' performance on the graph world, conditioning on graph metrics and graph generator parameters. View details
    Zero-shot Transfer Learning within a Heterogeneous Graph via Knowledge Transfer Networks
    Minji Yoon
    Ziniu Hu
    Ruslan Salakhutdinov
    Advances in Neural Information Processing Systems (2022)
    Preview abstract Data continuously emitted from industrial ecosystems such as social or e-commerce platforms are commonly represented as heterogeneous graphs (HG) composed of multiple node/edge types. State-of-the-art graph learning methods for HGs known as heterogeneous graph neural networks (HGNNs) are applied to learn deep context-informed node representations. However, many HG datasets from industrial applications suffer from label imbalance between node types. As there is no direct way to learn using labels rooted at different node types, HGNNs have been applied to only a few node types with abundant labels. We propose a zero-shot transfer learning module for HGNNs called a Knowledge Transfer Network (KTN) that transfers knowledge from label-abundant node types to zero-labeled node types through rich relational information given in the HG. KTN is derived from the theoretical relationship, which we introduce in this work, between distinct feature extractors for each node type given in an HGNN model. KTN improves the performance of 6 different types of HGNN models by up to 960% for inference on zero-labeled node types and outperforms state-of-the-art transfer learning baselines by up to 73% across 18 different transfer learning tasks on HGs. View details
    Preview abstract Personalized PageRank (PPR) is a fundamental tool in unsupervised learning of graph representations such as node ranking, labeling, and graph embedding. However, while data privacy is one of the most important recent concerns, existing PPR algorithms are not designed to protect user privacy. PPR is highly sensitive to the input graph edges: the difference of only one edge may cause a big change in the PPR vector, potentially leaking private user data. In this work, we propose an algorithm which outputs an approximate PPR and has provably bounded sensitivity to input edges. In addition, we prove that our algorithm achieves similar accuracy to non-private algorithms when the input graph has large degrees. Our sensitivity-bounded PPR directly implies private algorithms for several tools of graph learning, such as, differentially private (DP) PPR ranking, DP node classification, and DP node embedding. To complement our theoretical analysis, we also empirically verify the practical performances of our algorithms. View details
    Preview abstract Graph Neural Networks (GNNs) have achieved state-of-the-art results on many graph analysis tasks such as node classification and link prediction. However, important unsupervised problems on graphs, such as graph clustering, have proved more resistant to advances in GNNs. In this paper, we study unsupervised training of GNN pooling in terms of their clustering capabilities. We start by drawing a connection between graph clustering and graph pooling: intuitively, a good graph clustering is what one would expect from a GNN pooling layer. Counterintuitively, we show that this is not true for state-of-the-art pooling methods, such as MinCut pooling. To address these deficiencies, we introduce Deep Modularity Networks (DMoN), an unsupervised pooling method inspired by the modularity measure of clustering quality, and show how it tackles recovery of the challenging clustering structure of real-world graphs. In order to clarify the regimes where existing methods fail, we carefully design a set of experiments on synthetic data which show that DMoN is able to jointly leverage the signal from the graph structure and node attributes. Similarly, on real-world data, we show that DMoN produces high quality clusters which correlate strongly with ground truth labels, achieving state-of-the-art results. View details
    Debiasing Graph Embeddings with Metadata-Orthogonal Training
    2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (2020)
    Preview abstract In many real world graphs, the formation of edges can be influenced by certain sensitive features of the nodes (e.g. their gender, community, or reputation). In this paper we argue that when such influences exist, any downstream Graph Neural Network (GNN) will be implicitly biased by these structural correlations. To allow control over this phenomenon, we introduce the Metadata-Orthogonal Node Embedding Training (MONET) unit, a general neural network architecture component for performing training-time linear debiasing of graph embeddings. MONET operates by ensuring that the node embeddings are trained on a hyperplane orthogonal to that of the node features (metadata). Unlike debiasing approaches in similar domains, our method offers exact guarantees about the correlation between the resulting embeddings and any sensitive metadata. We illustrate the effectiveness of MONET though our experiments on a variety of real world graphs against challenging baselines (e.g. adversarial debiasing), showing superior performance in tasks such as preventing the leakage of political party affiliation in a blog network, and preventing the gaming of embedding-based recommendation systems. View details
    Examining COVID-19 Forecasting using Spatio-Temporal Graph Neural Networks
    Amol Kapoor
    Xue Ben
    Martin Blais
    Shawn O'Banion
    MLG workshop @ KDD'2020, epiDAMIK workshop @ KDD'2020 (2020) (to appear)
    Preview abstract In this work, we examine a novel forecasting approach for COVID-19 case prediction that uses Graph Neural Networks and mobility data. In contrast to existing time series forecasting models, the proposed approach learns from a single large-scale spatio-temporal graph, where nodes represent the region-level human mobility, spatial edges represent the human mobility based inter-region connectivity, and temporal edges represent node features through time. We evaluate this approach on the US county level COVID-19 dataset, and demonstrate that the rich spatial and temporal information leveraged by the graph neural network allows the model to learn complex dynamics. We show a 6% reduction of RMSLE and an absolute Pearson Correlation improvement from 0.9978 to 0.998 compared to the best performing baseline models. This novel source of information combined with graph based deep learning approaches can be a powerful tool to understand the spread and evolution of COVID-19. We encourage others to further develop a novel modeling paradigm for infectious disease based on GNNs and high resolution mobility data. View details
    Scaling Graph Neural Networks with Approximate PageRank
    Aleksandar Bojchevski
    Johannes Klipera
    Amol Kapoor
    Martin Blais
    Benedek András Rózemberczki
    Stephan Günnnemann
    KDD (2020)
    Preview abstract Graph neural networks (GNNs) have emerged as a powerful approach for solving many network mining tasks. However, despite their successes on small datasets, efficiently utilizing them on massive web-scale data remains a challenge. All recently proposed scalable GNN approaches rely on a message passing procedure to propagate information on the graph, leading to expensive recursive neighborhood expansion (and aggregation) schemes during both training and inference. This limitation is particularly problematic if we want to consider neighbors that are multiple hops away. In contrast, by leveraging connections between GNNs and personalized PageRank, we develop a model that incorporates multi-hop neighborhood information in a single (non-recursive) step. Our model \model, is significantly faster than previous scalable approaches while maintaining state-of-the-art prediction performance. Moreover, our algorithm can produce a scalability certificate which guarantees that the predictions will not change if we would have used instead a more expensive non-scalable baseline. To demonstrate the strengths and the scalability of our approach we both evaluate on existing datasets, and propose a new large scale graph learning setting, using the open academic graph (90M nodes, 3B edges). Additionally, we discuss the practical applications of large-scale semi-supervised learning, like \model~ at Google to solve node classification problems. View details
    Grale: Designing Networks for Graph Learning
    Alexandru Moșoi
    Sam Ruth
    Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Association for Computing Machinery (2020), 2523–2532
    Preview abstract How can we find the right graph for semi-supervised learning? In real world applications, the choice of which edges to use for computation is the first step in any graph learning process. Interestingly, there are often many types of similarity available to choose as the edges between nodes, and the choice of edges can drastically affect the performance of downstream semi-supervised learning systems. However, despite the importance of graph design, most of the literature assumes that the graph is static. In this work, we present Grale, a scalable method we have developed to address the problem of graph design for graphs with billions of nodes. Grale operates by fusing together different measures of (potentially weak) similarity to create a graph which exhibits high task-specific homophily between its nodes. Grale is designed for running on large datasets. We have deployed Grale in more than 20 different industrial settings at Google, including datasets which have tens of billions of nodes, and hundreds of trillions of potential edges to score. By employing locality sensitive hashing techniques, we greatly reduce the number of pairs that need to be scored, allowing us to learn a task specific model and build the associated nearest neighbor graph for such datasets in hours, rather than the days or even weeks that might be required otherwise. We illustrate this through a case study where we examine the application of Grale to an abuse classification problem on YouTube with hundreds of million of items. In this application, we find that Grale detects a large number of malicious actors on top of hard-coded rules and content classifiers, increasing the total recall by 89% over those approaches alone. View details
    N-GCN: Multi-scale Graph Convolution for Semi-supervised Node Classification
    Sami Abu-El-Haija
    Amol Kapoor
    Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) (2019)
    Preview abstract Graph Convolutional Networks (GCNs) have shown significant improvements in semi-supervised learning on graph-structured data. Concurrently, unsupervised learning of graph embeddings has benefited from the information contained in random walks. In this paper, we propose a model: Network of GCNs (NGCN), which marries these two lines of work. At its core, N-GCN trains multiple instances of GCNs over node pairs discovered at different distances in random walks, and learns a combination of the instance outputs which optimizes the classification objective. Our experiments show that our proposed N-GCN model improves state-of-the-art baselines on all of the challenging node classification tasks we consider: Cora, Citeseer, Pubmed, and PPI. In addition, our proposed method has other desirable properties, including generalization to recently proposed semi-supervised learning methods such as GraphSAGE, allowing us to propose N-SAGE, and resilience to adversarial input perturbations. View details
    Preview abstract Can neural networks learn to compare graphs without feature engineering? In this paper, we show that it is possible to learn representations for graph similarity with neither domain knowledge nor supervision (i.e. feature engineering or labeled graphs). We propose Deep Divergence Graph Kernels, an unsupervised method for learning representations over graphs that encodes a relaxed notion of graph isomorphism. Our method consists of three parts. First, we learn an encoder for each anchor graph to capture its structure. Second, for each pair of graphs, we train a cross-graph attention network which uses the node representations of an anchor graph to reconstruct another graph. This approach, which we call isomorphism attention, captures how well the representations of one graph can encode another. We use the attention-augmented encoder's predictions to define a divergence score for each pair of graphs. Finally, we construct an embedding space for all graphs using these pair-wise divergence scores. Unlike previous work, much of which relies on 1) supervision, 2) domain specific knowledge (e.g. a reliance on Weisfeiler-Lehman kernels), and 3) known node alignment, our unsupervised method jointly learns node representations, graph representations, and an attention-based alignment between graphs. Our experimental results show that Deep Divergence Graph Kernels can learn an unsupervised alignment between graphs, and that the learned representations achieve competitive results when used as features on a number of challenging graph classification tasks. Furthermore, we illustrate how the learned attention allows insight into the the alignment of sub-structures across graphs. View details
    Preview abstract Recent interest in graph embedding methods has focused on learning a single representation for each node in the graph. But can nodes really be best described by a single vector representation? In this work, we propose a method for learning multiple representations of the nodes in a graph (e.g., the users of a social network). Based on a principled decomposition of the ego-network, each representation encodes the role of the node in a different local community in which the nodes participate. These representations allow for improved reconstruction of the nuanced relationships that occur in the graph -- a phenomenon that we illustrate through state-of-the-art results on link prediction tasks on a variety of graphs, reducing the error by up to $90\%$. In addition, we show that these embeddings allow for effective visual analysis of the learned community structure. View details
    MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing
    Sami Abu-El-Haija
    Amol Kapoor
    Hrayr Harutyunyan
    Nazanin Alipourfard
    Kristina Lerman
    Greg Ver Steeg
    Aram Galstyan
    The Thirty-sixth International Conference on Machine Learning (ICML) (2019)
    Preview abstract Recently, many methods have been proposed for semi-supervised learning that extend the convolutional operator from Euclidean domains to graph-structured data by approximating the eigenbasis of the graph Laplacian. However, despite their prevalence, there has not been extensive analysis of the expressive power of these models. In this work, we prove that popular methods (such as the Graph Convolutional Network) do not model and cannot learn a class of neighborhood difference relationships which we call \delta operators. To address this weakness, we propose a new model, MixHop, that can capture these difference relationships by learning mixed feature representations of neighbors at various distances. MixHop requires no additional memory or computational complexity, and outperforms challenging baselines on several graph datasets including citation networks, synthetic graphs, and molecule classification for quantum chemistry. Furthermore, we quantify how the model prioritizes neighborhood information across different network datasets by adding a sparsity regularizer. View details
    Preview abstract Graph embedding methods represent nodes in a continuous vector space, preserving information from the graph (e.g. by sampling random walks). There are many hyper-parameters to these methods (such as random walk length) which have to be manually tuned for every graph. In this paper, we replace random walk hyper-parameters with trainable parameters that we automatically learn via backpropagation. In particular, we learn a novel attention model on the power series of the transition matrix, which guides the random walk to optimize an upstream objective. Unlike previous approaches to attention models, the method that we propose utilizes attention parameters exclusively on the data (e.g. on the random walk), and not used by the model for inference. We experiment on link prediction tasks, as we aim to produce embeddings that best-preserve the graph structure, generalizing to unseen information. We improve state-of-the-art on a comprehensive suite of real world datasets including social, collaboration, and biological networks. Adding attention to random walks can reduce the error by 20% to 45% on datasets we attempted. Further, our learned attention parameters are different for every graph, and our automatically-found values agree with the optimal choice of hyper-parameter if we manually tune existing methods. View details
    HARP: Hierarchical Representation Learning for Networks
    Haochen Chen
    Yifan Hu
    Steven Skiena
    AAAI'18 (2018) (to appear)
    Preview abstract We present HARP, a novel method for learning low dimensional embeddings of a graph's nodes which preserves higher-order structural features. Our proposed method achieves this by compressing the input graph prior to embedding it, effectively avoiding troublesome embedding configurations (i.e. local minima) which can pose problems to non-convex optimization. HARP works by finding a smaller graph which approximates the global structure of its input. This simplified graph is used to learn a set of initial representations, which serve as good initializations for learning representations in the original, detailed graph. We inductively extend this idea, by decomposing a graph in a series of levels, and then embed the hierarchy of graphs from the coarsest one to the original graph. HARP is a general meta-strategy to improve all of the state-of-the-art neural algorithms for embedding graphs, including DeepWalk, LINE, and Node2vec. Indeed, we demonstrate that applying HARP's hierarchical paradigm yields improved implementations for all three of these methods, as evaluated on both classification tasks on real-world graphs such as DBLP, BlogCatalog, CiteSeer, and Arxiv, where we achieve a performance gain over the original implementations by up to 14% Macro F1. View details
    Learning Edge Representations via Low-Rank Asymmetric Projections
    ACM International Conference on Information and Knowledge Management (2017) (to appear)
    Preview abstract We propose a new method for embedding graphs while preserving directed edge information. Learning such continuous-space vector representations (or embeddings) of nodes in a graph is an important first step for using network information (from social networks, user-item graphs, knowledge bases, etc.) in many machine learning tasks. Unlike previous work, we (1) explicitly model an edge as a function of node embeddings, and we (2) propose a novel objective, the "graph likelihood", which contrasts information from sampled random walks with non-existent edges. Individually, both of these contributions improve the learned representations, especially when there are memory constraints on the total size of the embeddings. When combined, our contributions enable us to significantly improve the state-of-the-art by learning more concise representations that better preserve the graph structure. We evaluate our method on a variety of link-prediction task including social networks, collaboration networks, and protein interactions, showing that our proposed method learn representations with error reductions of up to 76% and 55%, on directed and undirected graphs. In addition, we show that the representations learned by our method are quite space efficient, producing embeddings which have higher structure-preserving accuracy but are 10 times smaller. View details
    When Recommendation Goes Wrong - Anomalous Link Discovery in Recommendation Networks
    Michael Schueppert
    Jack Saalweachter
    Mayur Thakur
    Proceedings of the 22th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016)
    Preview abstract We present a secondary ranking system to find and remove erroneous suggestions from a geospatial recommendation system. We discover such anomalous links by “double checking” the recommendation system’s output to ensure that it is both structurally cohesive, and semantically consistent. Our approach is designed for the Google Related Places Graph, a geographic recommendation system which provides results for hundreds of millions of queries a day. We model the quality of a recommendation between two geographic entities as a function of their structure in the Related Places Graph, and their semantic relationship in the Google Knowledge Graph. To evaluate our approach, we perform a large scale human evaluation of such an anomalous link detection system. For the long tail of unpopular entities, our models can predict the recommendations users will consider poor with up to 42% higher mean precision (29 raw points) than the live system. Results from our study reveal that structural and semantic features capture different facets of relatedness to human judges. We characterize our performance with a qualitative analysis detailing the categories of real-world anomalies our system is able to detect, and provide a discussion of additional applications of our method. View details
    DeepWalk: Online Learning of Social Representations
    Steven Skiena
    Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014), pp. 701-710
    Polyglot: Distributed Word Representations for Multilingual NLP
    Steven Skiena
    Proceedings of the Seventeenth Conference on Computational Natural Language Learning (2013), pp. 183-192