Ouais Alsharif
I joined Google in 2014. I'm interested and curious about many fields. I work on the interface of research and engineering. At Alphabet, I worked on Speech processing, synthesis, natural language and self-driving cars.
Authored Publications
Sort By
On The Compression Of Recurrent Neural Networks With An Application To LVCSR Acoustic Modeling For Embedded Speech Recognition
Antoine Bruguier
Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)
Preview abstract
We study the problem of compressing recurrent neural networks (RNNs). In particular, we focus on the compression
of RNN acoustic models, which are motivated by the goal
of building compact and accurate speech recognition systems
which can be run efficiently on mobile devices. In this work, we present a technique for general recurrent model compression that jointly compresses both recurrent and non-recurrent inter-layer weight matrices. We find that the proposed technique allows us to reduce the size of our Long Short-Term Memory (LSTM) acoustic model to a third of its original size with negligible loss in accuracy.
View details
Personalized Speech Recognition On Mobile Devices
Raziel Alvarez
David Rybach
Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016)
Preview abstract
We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5 Android smartphone. We employ a quantized Long Short-Term Memory (LSTM) acoustic model trained with connectionist temporal classification (CTC) to directly predict phoneme targets, and further reduce its memory footprint using an SVD-based compression scheme. Additionally, we minimize our memory footprint by using a single language model for both dictation and voice command domains, constructed using Bayesian interpolation. Finally, in order to properly handle device-specific information, such as proper names and other context-dependent information, we inject vocabulary items into the decoder graph and bias the language model on-the-fly. Our system achieves 13.5% word error rate on an open-ended dictation task, running with a median speed that is seven times faster than real-time.
View details
Long-Short Term Memory Neural Network for Keyboard Gesture Recognition
Preview
Thomas Breuel
Johan Schalkwyk
International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)