AMBIQUAL – a full reference objective quality metric for ambisonic spatial audio
Abstract
Streaming spatial audio over networks requires
efficient encoding techniques that compress the raw audio content
without compromising quality of experience. Streaming service
providers such as YouTube need a perceptually relevant objective
audio quality metric to monitor users’ perceived quality and
spatial localization accuracy. In this paper we introduce a full
reference objective spatial audio quality metric, AMBIQUAL,
which assesses both Listening Quality and Localization Accuracy.
In our solution both metrics are derived directly from the
B-format Ambisonic audio. The metric extends and adapts the
algorithm used in ViSQOLAudio, a full reference objective metric
designed for assessing speech and audio quality. In particular,
Listening Quality is derived from the omnidirectional channel
and Localization Accuracy is derived from a weighted sum
of similarity from B-format directional channels. This paper
evaluates whether the proposed AMBIQUAL objective spatial
audio quality metric can predict two factors: Listening Quality
and Localization Accuracy by comparing its predictions with
results from MUSHRA subjective listening tests. In particular,
we evaluated the Listening Quality and Localization Accuracy
of First and Third-Order Ambisonic audio compressed with
the OPUS 1.2 codec at various bitrates (i.e. 32, 128 and 256,
512kbps respectively). The sample set for the tests comprised
both recorded and synthetic audio clips with a wide range of
time-frequency characteristics. To evaluate Localization Accuracy
of compressed audio a number of fixed and dynamic (moving
vertically and horizontally) source positions were selected for the
test samples. Results showed a strong correlation (PCC=0.919;
Spearman=0.882 regarding Listening Quality and PCC=0.854;
Spearman=0.842 regarding Localization Accuracy) between objective
quality scores derived from the B-format Ambisonic
audio using AMBIQUAL and subjective scores obtained during
listening MUSHRA tests. AMBIQUAL displays very promising
quality assessment predictions for spatial audio. Future work will
optimise the algorithm to generalise and validate it for any Higher
Order Ambisonic formats.
efficient encoding techniques that compress the raw audio content
without compromising quality of experience. Streaming service
providers such as YouTube need a perceptually relevant objective
audio quality metric to monitor users’ perceived quality and
spatial localization accuracy. In this paper we introduce a full
reference objective spatial audio quality metric, AMBIQUAL,
which assesses both Listening Quality and Localization Accuracy.
In our solution both metrics are derived directly from the
B-format Ambisonic audio. The metric extends and adapts the
algorithm used in ViSQOLAudio, a full reference objective metric
designed for assessing speech and audio quality. In particular,
Listening Quality is derived from the omnidirectional channel
and Localization Accuracy is derived from a weighted sum
of similarity from B-format directional channels. This paper
evaluates whether the proposed AMBIQUAL objective spatial
audio quality metric can predict two factors: Listening Quality
and Localization Accuracy by comparing its predictions with
results from MUSHRA subjective listening tests. In particular,
we evaluated the Listening Quality and Localization Accuracy
of First and Third-Order Ambisonic audio compressed with
the OPUS 1.2 codec at various bitrates (i.e. 32, 128 and 256,
512kbps respectively). The sample set for the tests comprised
both recorded and synthetic audio clips with a wide range of
time-frequency characteristics. To evaluate Localization Accuracy
of compressed audio a number of fixed and dynamic (moving
vertically and horizontally) source positions were selected for the
test samples. Results showed a strong correlation (PCC=0.919;
Spearman=0.882 regarding Listening Quality and PCC=0.854;
Spearman=0.842 regarding Localization Accuracy) between objective
quality scores derived from the B-format Ambisonic
audio using AMBIQUAL and subjective scores obtained during
listening MUSHRA tests. AMBIQUAL displays very promising
quality assessment predictions for spatial audio. Future work will
optimise the algorithm to generalise and validate it for any Higher
Order Ambisonic formats.