Combining Monaural and Binaural Evidence for Reverberant Speech Segregation

John Woodruff
Eric Fosler-Lussier
DeLiang Wang
Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), ISCA(2010), pp. 406-409

Abstract

Most existing binaural approaches to speech segregation rely on spatial filtering. In environments with minimal reverberation and when sources are well separated in space, spatial filtering can achieve excellent results. However, in everyday environments performance degrades substantially. To address these limitations, we incorporate monaural analysis within a binaural segregation system. We use monaural cues to perform both local and across frequency grouping of mixture components, allowing for a more robust application of spatial filtering. We propose a novel framework in which we combine monaural grouping evidence and binaural localization evidence in a linear model for the estimation of the ideal binary mask. Results indicate that with appropriately designed features that capture both monaural and binaural evidence, an extremely simple model achieves a signal-to-noise ratio improvement of up to 4 dB relative to using spatial filtering alone.

Research Areas