Ultra Low-Bitrate Speech Coding with Pretrained Transformers

Ali Siakoohi

Bastiaan Kleijn

Jan Skoglund

Michael Chinen

Tom Denton

Interspeech 2022

Google Scholar

Abstract

Speech coding facilitates the transmission of speech over low-bandwidth networks with minimal distortion. Neural-network based speech codecs have recently demonstrated significant improvements in performance over traditional approaches. While this new generation of codecs is capable of synthesizing high-fidelity speech, their use of recurrent or convolutional layers often restricts their effective receptive fields, which prevents them from compressing speech efficiently. We propose to further reduce the bitrate of neural speech codecs through the use of pretrained Transformers, capable of exploiting long-range dependencies in the input signal due to their inductive bias. Our numerical experiments show that supplementing the encoder of a neural speech codec with Transformer speech embeddings yields a speech codec with a bitrate of $600\,\mathrm{bps}$ that outperforms the original neural speech codec in synthesized speech quality when trained at the same bitrate. The subjective human evaluations also suggest that the perceived quality of the resulting codec is comparable or better than that of conventional codecs operating at 3--4 times the rate.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Ultra Low-Bitrate Speech Coding with Pretrained Transformers

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Ultra Low-Bitrate Speech Coding with Pretrained Transformers

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities