Google Research


Odyssey 2020 The Speaker and Language Recognition Workshop (2020)


Noise robustness remains a challenging problem in on-device keyword spotting, which can be improved by using multiple microphones. While this increases accuracy, it inevitably pushes up computational complexity and tends to require for more memory space. In this paper, we propose a new neural-network based architecture which takes multiple microphone signals as inputs. It can achieve better accuracy and incurs just a minimum increase in model size. Compared with a single-channel baseline which runs in parallel on each channel, the proposed architecture reduces the false reject (FR) rate relatively by 36.3\% and 46.4\% on dual-microphone clean and noisy test sets, respectively, at a rate of 0.1 false accepts (FA) per hour.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work