Bone Conducted Signal Guided Speech Enhancement For Voice Assistant on Earbuds

Jens Heitkaemper

Joe Caroselli

Max McKinnon

Arun Narayanan

Nathan Howard

ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE

Download Google Scholar

Abstract

In this work we present a multi-modal, streaming enhancement net-work to improve speech recognition for voice assistants. The pro-posed model is guided by the bone conducted signal (BCS) to sep-arate the interfering sources from the target speaker signal. Wet trained the model on a simulated speech enhancement training set with a simulated BCS and finetune it on a small earbuds specific training set, consisting of less than 7 hours of speech. To account for distorted BCS the enhancement module is complemented by a voice activity-based decision to discard the enhanced output for BCS without speech information. A possibility to preprocess the BCS to account for the low-pass characteristic of the bone conduction is evaluated to lower the required transmission bandwidth from the ear-buds to the recognition device. The results show that a reduction of the BCS bandwidth can be reduced to 500 Hz with only a small losses in word error rate (WER). The system with and without bandwidth reduction are compared to a state-of-the-art multi-channel enhancement method on a realistic test set and outperforms the multi-channel model for most of the considered sets

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Bone Conducted Signal Guided Speech Enhancement For Voice Assistant on Earbuds

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs