A Flexible Approach to Autotuning Multi-Pass Machine Learning Compilers

Amit Sabne; Berkin Ilbeyi; Bjarke Roune; Blake Hechtman; Christof Angermueller; Emma Wang; Karthik Srinivasa Murthy; Ketan Mandke; Mangpo Phothilimthana; Mike Burrows; Nikhil Sarda; Rezsa Farahani; Samuel J. Kaufman; Shen Wang; Sudip Roy; Yanqi Zhou; Yuanzhong Xu

A Flexible Approach to Autotuning Multi-Pass Machine Learning Compilers

Amit Sabne

Berkin Ilbeyi

Bjarke Roune

Blake Hechtman

Christof Angermueller

Emma Wang

Karthik Srinivasa Murthy

Ketan Mandke

Mangpo Phothilimthana

Mike Burrows

Nikhil Sarda

Rezsa Farahani

Samuel J. Kaufman

Shen Wang

Sudip Roy

Yanqi Zhou

Yuanzhong Xu

PACT (2021)

Google Scholar

Abstract

Search-based techniques have been demonstrated effective in solving complex optimization problems that arise in domain-specific compilers for machine learning (ML). Unfortunately, deploying such techniques in production compilers is impeded by two limitations. First, prior works require factorization of a computation graph into smaller subgraphs over which search is applied. This decomposition is not only non-trivial but also significantly limits the scope of optimization. Second, prior works require search to be applied in a single stage in the compilation flow, which does not fit with the multi-stage layered architecture of most production ML compilers.

This paper presents XTAT, an autotuner for production ML compilers that can tune both graph-level and subgraph-level optimizations across multiple compilation stages. XTAT applies XTAT-M, a flexible search methodology that defines a search formulation for joint optimizations by accurately modeling the interactions between different compiler passes. XTAT tunes tensor layouts, operator fusion decisions, tile sizes, and code generation parameters in XLA, a production ML compiler, using various search strategies. In an evaluation across 150 ML training and inference models on Tensor Processing Units (TPUs) at Google, XTAT offers up to 2.4x and an average 5% execution time speedup over the heavily-optimized XLA compiler.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

A Flexible Approach to Autotuning Multi-Pass Machine Learning Compilers

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs