Binaural processing for robust speech recognition of degraded speech

Anjali Menon; Chanwoo Kim; Umpei Kurokawa; Richard M. Stern

Binaural processing for robust speech recognition of degraded speech

Anjali Menon

Chanwoo Kim

Umpei Kurokawa

Richard M. Stern

IEEE Automatic Speech Recognition and Understanding Workshop (2017)

Download Google Scholar

Abstract

This paper discusses a new combination of techniques that help in improving the accuracy of speech recognition in adverse conditions using two microphones. Classic approaches toward binaural speech processing use some form of cross-correlation over time across the two sensors to effectively iso-
late target speech from interferers. Several additional techniques using temporal and spatial masking have been proposed in the past to improve recognition accuracy in the presence of reverberation and interfering talkers. In this paper, we consider the use of cross-correlation across frequency over
some limited range of frequency channels in addition to the existing methods of monaural and binaural processing. This has the effect of locating and reinforcing coincident peaks across frequency over the representation of binaural interaction and provides local smoothing over the specified range of frequencies. Combined with the temporal and spatial masking techniques mentioned above, this leads to significant improvements in binaural speech recognition.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Binaural processing for robust speech recognition of degraded speech

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs