Google Research

Compression of End-to-End Models

Abstract

End-to-end models which are trained to directly output grapheme or word-piece targets have been demonstrated to be competitive with conventional speech recognition models. Such models do not require additional resources for decoding, and are typically much smaller than conventional models while makes them particularly attractive in the context of on-device speech recognition where both small memory footprint and low power consumption are critical. With these constraints in mind, in this work, we consider the problem of compressing end-to-end models with the goal of minimizing the number of model parameters without sacrificing model accuracy. We explore matrix factorization, knowledge distillation and parameter sparsity to determine the most effect method given a fixed parameter budget.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work