Learned TPU Cost Model for XLA Tensor Programs

Mike Burrows
Samuel J. Kaufman
Workshop on ML for Systems at NeurIPS(2019)
Google Scholar

Abstract

At Google, we would like to develop a cost model that can accurately estimate the execution time of a machine learning model running on a Tensor Processing Unit (TPU). This cost model can be used by a compiler to make heuristic decisions, by an autotuner to find an optimal configuration of a specific program, and by Neural Architecture Search to co-optimize accuracy and inference time. However, building an accurate analytical cost model is challenging because of the complexity of modern processors. We propose to learn a cost model using a neural network. Our cost model uses a feedforward neural network to predict execution time from a graph embedding based on GraphSAGE. Our model’s mean predictions are within 13% of the actual execution time.