Partial Monotonic Speech Quality Estimation in ViSQOL with Deep Lattice Networks

Andrew Hines

Jan Skoglund

Michael Chinen

Journal of the Acoustical Society of America, vol. 149 (2021), pp. 3851-3861

Google Scholar

Abstract

When predicting subjective quality as mean opinion score (MOS) for speech, a raw similarity score is often mapped onto the score dimension with a mapping function. Virtual Speech Quality Objective Listener (ViSQOL) uses monotonic one-dimensional mappings to evaluate speech. More recent models such as support vector regression (SVR) or deep neural networks (DNNs) use multidimensional input, which allows for a more accurate prediction, but do not provide the monotonic property that is expected. We propose to integrate a multi-dimensional mapping function using deep lattice networks (DLNs) into ViSQOL. DLNs also provide some insight into model interpretation and are robust to overfitting, leading to better out-of-sample performance. With the DLN, ViSQOL improved the speech mapping from the previous exponential mapping's .58 MSE to .24 MSE on a mixture of datasets, outperforming the 1-D fitted functions, SVR, as well as PESQ and POLQA. Additionally, we show that the DLN can be used to learn a quantile function that is well calibrated and a useful measure of uncertainty. With this quantile function, the model is able to provide useful quantile intervals for predictions instead of point intervals.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Partial Monotonic Speech Quality Estimation in ViSQOL with Deep Lattice Networks

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Partial Monotonic Speech Quality Estimation in ViSQOL with Deep Lattice Networks

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities