ReLeQ: A Reinforcement Learning Approach for Automatic Deep Quantization of Neural Networks
Abstract
Deep Quantization can significantly reduce DNN computation and storage by decreasing the bitwidth of network encodings. However, without arduous manual effort, this deep quantization can lead to significant accuracy loss, leaving it in a position of questionable utility. We propose a systematic approach to tackle this problem, by automating the process of discovering the quantization levels through an end-to-end deep reinforcement learning framework (RELEQ). This framework utilizes the sample efficiency of Proximal Policy Optimization (PPO) to explore the exponentially large space of possible assignment of the quantization-levels to the layers. We show how RELEQ can balance speed and quality, and provide a heterogeneous bitwidth assignment for quantization of a large variety of deep networks that virtually preserves the accuracy (0.3% loss) while minimizes the computation and storage costs. With these DNNs, RELEQ enables conventional hardware and custom DNN accelerator to achieve 2.2 speedup over 8-bit execution.