Jump to Content

Automatically batching control-intensive programs for modern accelerators

Alexey Radul
Dougal Maclaurin
Third Conference on Systems and Machine Learning, Austin, TX (2020)

Abstract

We present a general approach to batching arbitrary computations for GPU and TPU accelerators. We demonstrate the effectiveness of our method with orders-of-magnitude speedups on the No U-Turn Sampler (NUTS), a workhorse algorithm in Bayesian statistics. The central challenge of batching NUTS and other Markov chain Monte Carlo algorithms is data-dependent control flow and recursion. We overcome this by mechanically transforming a single-example implementation into a form that explicitly tracks the current program point for each batch member, and only steps forward those in the same place. We present two different batching algorithms: a simpler, previously published one that inherits recursion from the host Python, and a more complex, novel one that implmenents recursion directly and can batch across it. We implement these batching methods as a general program transformation on Python source. Both the batching system and the NUTS implementation presented here are available as part of the popular TensorFlow Probability software package.