Audio Segmentation for Speech Recognition using Segment Features

Christian Gollan
Ralf Schlüter
Hermann Ney
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)(2009), pp. 4197-4200

Abstract

Audio segmentation is an essential preprocessing step in several audio processing applications with a significant impact e.g. on speech recognition performance. We introduce a novel framework which combines the advantages of different well known segmentation methods. An automatically estimated log-linear segment model is used to determine the segmentation of an audio stream in a holistic way by a maximum a posteriori decoding strategy, instead of classifying change points locally. A comparison to other segmentation techniques in terms of speech recognition performance is presented, showing a promising segmentation quality of our approach.

Research Areas