Serving a GNN model in online applications is challenging --- one has to propagate the information from training nodes to testing nodes to achieve the best performance, while storing the whole training set (including training graph and node features) during inference time is prohibitive for most of the real world applications. We tackle this serving space compression problem in the paper, where the goal is to compress the storage requirement for GNN serving. Given a model to be served, the proposed method constructs a small set of virtual representative nodes to replace the original training nodes, so that users just need to replace the original training set by this virtual representative set to reduce the space requirement for serving, without the need of changing the actual GNN model and the forward pass.
We carefully analyze the error in the forward pass and derive simple ways to construct the node features and graph of virtual representative nodes to minimize the approximation error. Experimental results demonstrate that the proposed method can significantly reduce the serving space requirement for GNN inference.