Learned TPU Cost Model for XLA Tensor Programs
Abstract
At Google, we would like to develop a cost model that can accurately estimate the
execution time of a machine learning model running on a Tensor Processing Unit
(TPU). This cost model can be used by a compiler to make heuristic decisions,
by an autotuner to find an optimal configuration of a specific program, and by
Neural Architecture Search to co-optimize accuracy and inference time. However,
building an accurate analytical cost model is challenging because of the complexity
of modern processors.
We propose to learn a cost model using a neural network. Our cost model uses
a feedforward neural network to predict execution time from a graph embedding
based on GraphSAGE. Our model’s mean predictions are within 13% of the actual
execution time.
execution time of a machine learning model running on a Tensor Processing Unit
(TPU). This cost model can be used by a compiler to make heuristic decisions,
by an autotuner to find an optimal configuration of a specific program, and by
Neural Architecture Search to co-optimize accuracy and inference time. However,
building an accurate analytical cost model is challenging because of the complexity
of modern processors.
We propose to learn a cost model using a neural network. Our cost model uses
a feedforward neural network to predict execution time from a graph embedding
based on GraphSAGE. Our model’s mean predictions are within 13% of the actual
execution time.