Robust speech recognition in reverberant environments using subband-based steady-state monaural and binaural suppression

Hyung-Min Park
Matthew Maciejewski
Chanwoo Kim
Richard M. Stern
INTERSPEECH(2014), pp. 2715-2718


The precedence effect describes the ability of the auditory system to suppress the later-arriving components of sound in a reverberant environment, maintaining the perceived arrival azimuth of a sound in the direction of the actual source, even though later reverberant components may arrive from other directions. It is also widely believed that precedence-like processing can also improve speech intelligibility, as well as the accuracy of speech recognition systems, in reverberant environments. While the mechanisms underlying the precedence effect have traditionally been assumed to be binaural in nature, it is also possible that the suppression of later components may take place monaurally, and that the suppression of the later-arriving components of the spatial image may be a consequence of this more peripheral processing. This paper compares the potential contributions of onset enhancement (and consequent steady-state suppression) of the envelopes of subband components of speech at both the monaural and binaural levels. Experimental results indicate that substantial improvement in recognition accuracy can be obtained in reverberant environments if the feature extraction includes both onset enhancement and binaural interaction. Recognition accuracy appears to be relatively unaffected by the stage in the suppression processing at which the binaural interaction takes place.