Learned TPU Cost Model for XLA Tensor Programs

Mangpo Phothilimthana
Mike Burrows
Samuel J. Kaufman
Workshop on ML for Systems at NeurIPS (2019)
Google Scholar

Abstract

At Google, we would like to develop a cost model that can accurately estimate the
execution time of a machine learning model running on a Tensor Processing Unit
(TPU). This cost model can be used by a compiler to make heuristic decisions,
by an autotuner to find an optimal configuration of a specific program, and by
Neural Architecture Search to co-optimize accuracy and inference time. However,
building an accurate analytical cost model is challenging because of the complexity
of modern processors.

We propose to learn a cost model using a neural network. Our cost model uses
a feedforward neural network to predict execution time from a graph embedding
based on GraphSAGE. Our model’s mean predictions are within 13% of the actual
execution time.