SMALL FOOTPRINT MULTI-CHANNEL KEYWORD SPOTTING

Jilong Wu
Yiteng Huang
Hyun Jin Park
Niranjan Subrahmanya
Patrick Violette
Odyssey 2020 The Speaker and Language Recognition Workshop (2020)

Abstract

Noise robustness remains a challenging problem in on-device keyword spotting,
which can be improved by using multiple microphones. While this increases
accuracy, it inevitably pushes up computational complexity and tends to
require for more memory space. In this
paper, we propose a new neural-network based architecture which takes multiple
microphone signals as inputs. It can achieve better accuracy and incurs
just a minimum increase in model size. Compared with
a single-channel baseline which runs in parallel on each channel, the
proposed architecture reduces the false reject (FR) rate relatively by 27.2%
and 31.8% on dual-microphone clean and noisy test sets, respectively,
at a rate of 0.1 false accepts (FA) per hour.

Research Areas