Jump to Content
Oleg Rybakov

Oleg Rybakov

I am a software engineering manager at Google AI, working on models(speech, vision) optimization for mobile and cloud applications. I have industry research experience, in image/video enhancement, computer vision, recommender systems, distributed training at scale and speech processing.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Real-time Speech Frequency Bandwidth Extension
    2021 IEEE International Conference on Acoustics, Speech and Signal Processing (to appear)
    Preview abstract In this paper we propose a lightweight model that performs frequency bandwidth extension of speech signals, increasing the sampling frequency from 8kHz to 16kHz, while restoring the high frequency content to a level that is indistinguishable from the original samples at 16kHz. The model architecture is based on SEANet (Sound EnhAncement Network), a wave-to-wave fully convolutional model, which adopts a combination of feature losses and adversarial losses to reconstruct an enhanced version of the input speech. In addition, we propose a version of SEANet that can be deployed on device in streaming mode, achieving an architecture latency of 16ms. When profiled on a single mobile CPU, processing one 16ms frame takes only 1.5ms, so that the total latency is compatible with a deployment in bi-directional voice communication systems. View details
    Streaming keyword spotting on mobile devices
    Natasha Kononenko
    Niranjan Subrahmanya
    Mirkó Visontai
    Stella Marie Laurenzo
    INTERSPEECH, ISCA, Shanghai China (2020)
    Preview abstract In this work we explore the latency and accuracy of keyword spotting (KWS) models in streaming and non-streaming modes on mobile phones. NN model conversion from non-streaming mode (model receives the whole input sequence and then returns the classification result) to streaming mode (model receives portion of the input sequence and classifies it incrementally) may require manual model rewriting. We address this by designing a Tensorflow/Keras based library which allows automatic conversion of non-streaming models to streaming ones with minimum effort. With this library we benchmark multiple KWS models in both streaming and non-streaming modes on mobile phones and demonstrate different tradeoffs between latency and accuracy. We also explore novel KWS models with multi-head attention which reduce the classification error over the state-of-art by 10% on Google speech commands data sets V2. The streaming library with all experiments is open-sourced. View details
    No Results Found