WARP-Q: Quality Prediction For Generative Neural Speech Codecs

Andrew Hines

Jan Skoglund

Michael Chinen

Wissam Jassim

ICASSP 2021 (2021)

Google Scholar

Abstract

Speech coding has been shown to achieve good speech quality using either waveform matching or parametric reconstruction. For very low bitrate streams, recently developed generative speech models can reconstruct high quality wide band speech from the bit streams of standard parametric encoders at less than 3 kb/s. Generative codecs create high quality codec speech based on synthesising speech from a DNN and the parametric input. Existing objective speech quality models (e.g. ViSQOL, POLQA) cannot be used to accurately evaluate the quality of generatively coded speech as they penalise them based on signal differences not apparent in subjective listening test results. This paper presents \NEWMODEL{}, a full-reference objective speech quality metric that uses dynamic time warping cost for MFCC representations of the signals. It is robust to the codec changes introduced by low-bitrate neural vocoders. Evaluation using waveform matching, parametric and generative neural vocoder based codecs as well as channel and environmental noise shows that \NEWMODEL{} has better correlation and codec quality ranking for novel codecs compared to traditional metrics as well as veritiltiy and potential for additive noise and channel degradations.

Research Areas

Speech Processing

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

WARP-Q: Quality Prediction For Generative Neural Speech Codecs

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

WARP-Q: Quality Prediction For Generative Neural Speech Codecs

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities