Google Research

Binaural processing for robust speech recognition of degraded speech

  • Anjali Menon
  • Chanwoo Kim
  • Umpei Kurokawa
  • Richard M. Stern
IEEE Automatic Speech Recognition and Understanding Workshop (2017)


This paper discusses a new combination of techniques that help in improving the accuracy of speech recognition in adverse conditions using two microphones. Classic approaches toward binaural speech processing use some form of cross-correlation over time across the two sensors to effectively iso- late target speech from interferers. Several additional techniques using temporal and spatial masking have been proposed in the past to improve recognition accuracy in the presence of reverberation and interfering talkers. In this paper, we consider the use of cross-correlation across frequency over some limited range of frequency channels in addition to the existing methods of monaural and binaural processing. This has the effect of locating and reinforcing coincident peaks across frequency over the representation of binaural interaction and provides local smoothing over the specified range of frequencies. Combined with the temporal and spatial masking techniques mentioned above, this leads to significant improvements in binaural speech recognition.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work