High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

William S. Moses; Ivan R. Ivanov; Jens Domke; Toshio Endo; Johannes Doerfert; Oleksandr Zinenko

High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

William S. Moses

Ivan R. Ivanov

Jens Domke

Toshio Endo

Johannes Doerfert

Oleksandr Zinenko

Proceedings of the Intl Conference on Principles and Practice of Parallel Programming (PPoPP), ACM (2023) (to appear)

Download Google Scholar

Abstract

While parallelism remains the main source of performance, architectural implementations and programming models change with each new hardware generation, often leading to costly application re-engineering. Most tools for performance portability require manual and costly application porting to yet another programming model.

We propose an alternative approach that automatically translates programs written in one programming model (CUDA), into another (CPU threads) based on Polygeist/MLIR. Our approach includes a representation of parallel constructs
that allows conventional compiler transformations to apply transparently and without modification and enables parallelism-specific optimizations.
We evaluate our framework by transpiling and optimizing the CUDA Rodinia benchmark suite for a multi-core CPU and achieve a 76\% geomean speedup over handwritten OpenMP code.
Further, we show how CUDA kernels from PyTorch can efficiently run and scale on the CPU-only Supercomputer Fugaku without user intervention. Our PyTorch compatibility layer making use of transpiled CUDA PyTorch kernels outperforms the PyTorch CPU native backend by 2.7x.

Research Areas

Software systems

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs