- Jingyue Wu
- Artem Belevich
- Eli Bendersky
- Mark Heffernan
- Chris Leary
- Jacques Pienaar
- Bjarke Roune
- Rob Springer
- Xuetian Weng
- Robert Hundt
Abstract
Graphics Processing Units have emerged as powerful accelerators for massively parallel, numerically intensive workloads. The two dominant software models for these devices are NVIDIA’s CUDA and the cross-platform OpenCL standard. Until now, there has not been a fully open-source compiler targeting the CUDA environment, hampering general compiler and architecture research and making deployment difficult in datacenter or supercomputer environments. In this paper, we present gpucc, an LLVM-based, fully open-source, CUDA compatible compiler for high performance computing. It performs various general and CUDA-specific optimizations to generate high performance code. The Clang-based frontend supports modern language features such as those in C++11 and C++14. Compile time is 8% faster than NVIDIA’s toolchain (nvcc) and it reduces compile time by up to 2.4x for pathological compilations (>100 secs), which tend to dominate build times in parallel build environments. Compared to nvcc, gpucc’s runtime performance is on par for several open-source benchmarks, such as Rodinia (0.8% faster), SHOC (0.5% slower), or Tensor (3.7% faster). It outperforms nvcc on internal large-scale end-to-end benchmarks by up to 51.0%, with a geometric mean of 22.9%.
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work