Music Models for Music-Speech Separation

Thad Hughes; Trausti Kristjansson

Music Models for Music-Speech Separation

Thad Hughes

Trausti Kristjansson

ICASSP, IEEE (2012), pp. 4917-4920

Google Scholar

Abstract

We consider the task of speech recognition with loud music background interference. We use model-based music-speech separation and train GMM models for music on the audio prior to speech. We show over 8% relative improvement in WER at 10 dB SNR for a real world Voice Search ASR system.
We investigate the relationship between ASR accuracy and the amount of music background used as prologue and the the size of music models. Our study shows that performance peaks when using a
music prologue of around 6 seconds to train the music model. We hypothesize that this is due to the dynamic nature of music and the structure of popular music. Adding more history beyond a certain point does not improve results. Additionally, we show moderately sized 8-component music GMM models suffice to model this amount of music prologue.

Index Terms— ASR, noise robustness, noise reduction, non-stationary noise, music

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Music Models for Music-Speech Separation

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs