Disentangling speech from surroundings with neural embeddings

Ahmed Omran; Félix de Chaumont Quitry; Malcolm Slaney; Marco Tagliasacchi; Neil Zeghidour; Zalán Borsos

Disentangling speech from surroundings with neural embeddings

Ahmed Omran

Félix de Chaumont Quitry

Malcolm Slaney

Marco Tagliasacchi

Neil Zeghidour

Zalán Borsos

ICASSP 2023 (2023)

Download Google Scholar

Abstract

We present a method to separate speech signals from noisy environments in the embedding space of a neural audio codec. We introduce a new training procedure that allows our model to produce structured encodings of audio waveforms given by embedding vectors, where one part of the embedding vector represents the speech signal, and the rest represent the environment. We achieve this by partitioning the embeddings of different input waveforms and training the model to faithfully reconstruct audio from mixed partitions, thereby ensuring each partition encodes a separate audio attribute. As use cases, we demonstrate the separation of speech from background noise or from reverberation characteristics. Our method also allows for targeted adjustments of the audio output characteristics.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Disentangling speech from surroundings with neural embeddings

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs