Graph Transformer: A Generalized Method for Computation Graph Optimizations

Amirali Abdolrashidi
Azalia Mirhoseini
Daniel Wong
Hanxiao Liu
Mangpo Phothilimthana
Qiumin Xu
Shen Wang
Sudip Roy
(2020)
Google Scholar

Abstract

Runtime and scalability of neural networks can be significantly affected by computational graph optimization during compilation. Most existing automated graph optimizations are impractical for deployment due to the significant amount of compute required and their inability to generalize to new, previously held-out graphs. To address both limitations, we propose an end-to-end deep reinforcement learning method named Graph Transformer (GTf), based on a scalable sequential attention mechanism over an inductive graph neural network that is transferable to new, unseen graphs. GTf generates decisions on the entire graph in a single-shot fashion, rather than on each individual node progressively, drastically speeding up the search compared to prior methods. Moreover, we propose recurrent attention layers to jointly optimize dependent graph optimization tasks and demonstrate 33%-60% speedup of three graph optimization tasks compared to Tensorflow default optimizations. On a diverse set of representative graphs consisting of 1k-80k nodes, including Inception-v3, Transformer-XL, and WaveNet, GTf achieves an average 21% improvement over human experts and 18% improvement over the prior art with 15x faster convergence, on a device placement task evaluated in real systems.