Jump to Content

Learned TPU Cost Model for XLA Tensor Programs

Mike Burrows
Samuel J. Kaufman
Workshop on ML for Systems at NeurIPS (2019)
Google Scholar

Abstract

At Google, we would like to develop a cost model that can accurately estimate the execution time of a machine learning model running on a Tensor Processing Unit (TPU). This cost model can be used by a compiler to make heuristic decisions, by an autotuner to find an optimal configuration of a specific program, and by Neural Architecture Search to co-optimize accuracy and inference time. However, building an accurate analytical cost model is challenging because of the complexity of modern processors. We propose to learn a cost model using a neural network. Our cost model uses a feedforward neural network to predict execution time from a graph embedding based on GraphSAGE. Our model’s mean predictions are within 13% of the actual execution time.