Speech/Nonspeech Segmentation in Web Videos

Ananya Misra

Speech/Nonspeech Segmentation in Web Videos

Ananya Misra

Proceedings of InterSpeech 2012

Google Scholar

Abstract

Speech transcription of web videos requires ﬁrst detecting segments with transcribable speech. We refer to this as segmentation. Commonly used segmentation techniques are inadequate for domains such as YouTube, where videos may have a large variety of background and recording conditions. In this work, we investigate alternative audio features and a discriminative classiﬁer, which together yield a lower frame error rate (25.3%) on YouTube videos compared to the commonly used Gaussian mixture models trained on cepstral features (30.6%). The alternative audio features perform particularly well in noisy conditions.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Speech/Nonspeech Segmentation in Web Videos

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs