Neil Birkbeck
Research Areas
Authored Publications
Sort By
Rich features for perceptual quality assessment of UGC videos
Joong Yim
CVPR 2021
Preview abstract
Video quality assessment for User Generated Content (UGC) is an important topic in both industry and academia. Most existing methods only focus on one aspect of the perceptual quality assessment, such as technical quality or compression artifacts. In this paper, we create a large scale dataset to comprehensively investigate characteristics of generic UGC video quality. Besides the subjective ratings and content labels of the dataset, we also propose a DNN-based framework to thoroughly analyze importance of content, technical quality, and compression level in perceptual quality. Our model is able to provide quality scores as well as human-friendly quality indicators, to bridge the gap between low level video signals to human perceptual quality. Experimental results show that our model achieves state-of-the-art correlation with Mean Opinion Scores (MOS).
View details
Subjective Quality Assessment for YouTube UGC Dataset
Joong Yim
2020 IEEE International Conference on Image Processing
Preview abstract
User Generated Contents~(UGC) received a lot of interests in academia and industry recently. To facilitate compression-related research on UGC, YouTube has released a large scale dataset~\cite{Wang2019UGCDataset}. The initial dataset only provided raw videos, which made it difficult for quality assessment. In this paper, we built a crowd-sourcing platform to collect and cleanup subjective quality scores for YouTube UGC dataset, and analyzed the distribution of Mean Opinion Score (MOS) in various dimensions. Some fundamental question in video quality assessment are also investigated, like the correlation between full video MOS and corresponding chunk MOS, and the influence of chunk variation in quality score aggregation.
View details
Video transcoding optimization based on input perceptual quality
Joong Yim
SPIE Optical Engineering + Applications: Applications of Digital Image Processing XLIII (2020)
Preview abstract
Todays video transcoding pipelines choose transcoding parameters based on Rate-Distortion curves, which mainlyfocuses on the relative quality difference between original and transcoded videos. By investigating recentlyreleased YouTube UGC dataset, we found that people were more tolerant to the quality changes in low qualityinputs than in high quality inputs, which suggests that current transcoding framework could be further optimizedby considering input perceptual quality. An efficient machine learning based metric was proposed to detect lowquality inputs, whose bitrate can be further reduced without hurting perceptual quality. To evaluate the impacton perceptual quality, we conducted a crowd-sourcing subjective experiment, and provided a methodology toevaluate statistical significance among different treatments. The results showed that the proposed quality guidedtranscoding framework is able to reduce the average bitrate upto 5% with insignificant quality degradation.
View details
On the first JND and Break in Presence of 360-degree content: An exploratory study
Roberto G. de A. Azevedo
Ivan Janatra
Pascal Frossard
MMVE '19- Proceedings of the 11th ACM Workshop on Immersive Mixed and Virtual Environment Systems (2019)
Preview abstract
Unlike traditional planar 2D visual content, immersive 360-degree images and videos undergo particular processing steps and are intended to be consumed via head-mounted displays (HMDs). To get a deeper understanding on the perception of 360-degree visual distortions when consumed through HMDs, we perform an exploratory task-based subjective study in which we have asked subjects to define the first noticeable difference and break-in-presence points when incrementally adding specific compression artifacts. The results of our study: give insights on the range of allowed visual distortions for 360-degree content; show that the added visual distortions are more tolerable in mono than in stereoscopic 3D; and identify issues with current 360-degree objective quality metrics.
View details
Visual Distortions in 360-degree Videos
Roberto G. de A. Azevedo
Francesca De Simone
Ivan Janatra
Pascal Frossard
IEEE Transactions on Circuits and Systems for Video Technology (2019)
Preview abstract
Omnidirectional (or 360-degree) images and videos are emergent signals being used in many areas such as robotics and virtual/augmented reality. In particular, for virtual reality applications, they allow an immersive experience in which the user can interactively navigate through a scene with three degrees of freedom, wearing a head-mounted display. Current approaches for capturing, processing, delivering, and displaying 360-degree content, however, present many open technical challenges and introduce several types of distortions in the visual signal. Some of the distortions are specific to the nature of 360-degree images and often differ from those encountered in classical visual communication frameworks. This paper provides a first comprehensive review of the most common visual distortions that alter 360-degree signals going through the different processing elements of the visual communication pipeline. While their impact on viewers’ visual perception and the immersive experience at large is still unknown –thus, it is an open research topic– this review serves the purpose of proposing a taxonomy of the visual distortions that can be encountered in 360-degree signals. Their underlying causes in the end-to-end 360-degree content distribution pipeline are identified. This taxonomy is essential as a basis for comparing different processing techniques, such as visual enhancement, encoding, and streaming strategies, and allowing the effective design of new algorithms and applications. It is also a useful resource for the design of psycho-visual studies aiming to characterize human perception of 360-degree content in interactive and immersive applications.
View details
Preview abstract
Omnidirectional video encoding and delivery are rapidly evolving fields, where choosing an efficient representation for storage and transmission of pixel data is critical. Given that there are a number of projections (pixel representations), a projection independent measure is needed to evaluate the merits of different options. We present a technique to evaluate projection quality by rendering virtual views and use this to evaluate three projections in common use: Equirectangular, Cubemap, and Equi-Angular Cubemap. Through evaluation on dozens of videos, our metrics rank the projection types consistently with pixel density computations and small scale user studies.
View details
Deformable block based motion estimation in omnidirectional image sequences
Francesca De Simone
Pascal Frossard
IEEE 19th International Workshop on Multimedia Signal Processing (2017)
Preview abstract
This paper presents an extension of block-based motion estimation for omnidirectional videos, based on a camera and translational object motion model that accounts for the spherical geometry of the imaging system. We use this model to design a new algorithm to perform block matching in sequences of panoramic frames that are the result of the equirectangular projection. Experimental results demonstrate that significant gains can be achieved with respect to the classical exhaustive block matching algorithm (EBMA) in terms of accuracy of motion prediction. In particular, average quality improvements up to approximately 6dB in terms of Peak Signal to Noise Ratio (PSNR), 0.043 in terms of Structural SIMilarity index (SSIM), and 2dB in terms of spherical PSNR, can be achieved on the predicted frames.
View details
Geometry-driven quantization for omnidirectional image coding
Francesca De Simone
Pascal Frossard
Paul Wilkins
Anil Kokaram
Picture Coding Symposium (PCS) (2016)
Preview abstract
In this paper we propose a method to adapt the quantization tables of typical block-based transform codecs when the input to the encoder is a panoramic image resulting from equirectangular projection of a spherical image. When the visual content is projected from the panorama to the viewport, a frequency shift is occurring. The quantization can be adapted accordingly: the quantization step sizes that would be optimal to quantize the transform coefficients of the viewport image block, can be used to quantize the coefficients of the panoramic block. As a proof of concept, the proposed quantization strategy has been used in JPEG compression. Results show that a rate reduction up to 2.99% can be achieved for the same perceptual quality of the spherical signal with respect to a standard quantization.
View details
Temporal Synchronization of Multiple Audio Signals
Sasi Inguva
Andy Crawford
Hugh Denman
Anil Kokaram
Proceedings of the International Conference on Signal Processing (ICASSP), Florence, Italy (2014)
Preview abstract
Given the proliferation of consumer media recording devices, events often give rise to a large number of recordings. These recordings are taken from different spatial positions and do not have reliable timestamp information. In this paper, we present two robust graph-based approaches for synchronizing multiple audio signals. The graphs are constructed atop the over-determined system resulting from pairwise signal comparison using cross-correlation of audio features. The first approach uses a Minimum Spanning Tree (MST) technique, while the second uses Belief Propagation (BP) to solve the system. Both approaches can provide excellent solutions and robustness to pairwise outliers, however the MST approach is much less complex than BP. In addition, an experimental comparison of audio features-based synchronization shows that spectral flatness outperforms the zero-crossing rate and signal energy.
View details