Compression of End-to-End Models

Ruoming Pang
Suyog Gupta
Shuyuan Zhang
Chung-Cheng Chiu


End-to-end models which are trained to directly output grapheme or word-piece targets have been demonstrated to be competitive with conventional speech recognition models. Such models do not require additional resources for decoding, and are typically much smaller than conventional models while makes them particularly attractive in the context of on-device speech recognition where both small memory footprint and low power consumption are critical. With these constraints in mind, in this work, we consider the problem of compressing end-to-end models with the goal of minimizing the number of model parameters without sacrificing model accuracy. We explore matrix factorization, knowledge distillation and parameter sparsity to determine the most effect method given a fixed parameter budget.

Research Areas