Multi-user VoiceFilter-Lite via Attentive Speaker Embedding

Rajeev Vijay Rikhye; Quan Wang; Qiao Liang; Yanzhang (Ryan) He; Ian Carmichael McGraw

Multi-user VoiceFilter-Lite via Attentive Speaker Embedding

Rajeev Vijay Rikhye

Quan Wang

Qiao Liang

Yanzhang (Ryan) He

Ian Carmichael McGraw

ASRU 2021

Download Google Scholar

Abstract

In this paper, we propose a solution to allow speaker conditioned speech models, such as VoiceFilter-Lite, to support an arbitrary number of enrolled users in a single pass. This is achieved by using an attention mechanism on multiple speaker embeddings to compute a single attentive embedding, which is then used as a side input to the model. We implemented multi-user VoiceFilter-Lite and evaluated it for two tasks: (1) a standard text-independent speaker verification task, where the input audio may contain overlapped speech; (2) a personalized keyphrase detection task, where ASR has to detect keyphrases from multiple enrolled users in a noisy environment. Our experiments show that with up to four enrolled users, multi-user VoiceFilter-Lite is able to significantly reduce speaker verification errors when there is overlapped speech, without hurting the performance under other acoustic conditions. This attentive speaker embedding approach can also be easily applied to other speaker-conditioned models such as personal VAD and personalized ASR.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Multi-user VoiceFilter-Lite via Attentive Speaker Embedding

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs