Personalized Speech Recognition On Mobile Devices

Ian McGraw; Rohit Prabhavalkar; Raziel Alvarez; Montse Gonzalez Arenas; Kanishka Rao; David Rybach; Ouais Alsharif; Hasim Sak; Alexander Gruenstein; Françoise Beaufays; Carolina Parada

Personalized Speech Recognition On Mobile Devices

Ian McGraw

Rohit Prabhavalkar

Raziel Alvarez

Montse Gonzalez Arenas

Kanishka Rao

David Rybach

Ouais Alsharif

Hasim Sak

Alexander Gruenstein

Françoise Beaufays

Carolina Parada

Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)

Google Scholar

Abstract

We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5 Android smartphone. We employ a quantized Long Short-Term Memory (LSTM) acoustic model trained with connectionist temporal classification (CTC) to directly predict phoneme targets, and further reduce its memory footprint using an SVD-based compression scheme. Additionally, we minimize our memory footprint by using a single language model for both dictation and voice command domains, constructed using Bayesian interpolation. Finally, in order to properly handle device-specific information, such as proper names and other context-dependent information, we inject vocabulary items into the decoder graph and bias the language model on-the-fly. Our system achieves 13.5% word error rate on an open-ended dictation task, running with a median speed that is seven times faster than real-time.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Personalized Speech Recognition On Mobile Devices

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs