Jump to Content
Neil Birkbeck

Neil Birkbeck

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Video quality assessment for User Generated Content (UGC) is an important topic in both industry and academia. Most existing methods only focus on one aspect of the perceptual quality assessment, such as technical quality or compression artifacts. In this paper, we create a large scale dataset to comprehensively investigate characteristics of generic UGC video quality. Besides the subjective ratings and content labels of the dataset, we also propose a DNN-based framework to thoroughly analyze importance of content, technical quality, and compression level in perceptual quality. Our model is able to provide quality scores as well as human-friendly quality indicators, to bridge the gap between low level video signals to human perceptual quality. Experimental results show that our model achieves state-of-the-art correlation with Mean Opinion Scores (MOS). View details
    Preview abstract User Generated Contents~(UGC) received a lot of interests in academia and industry recently. To facilitate compression-related research on UGC, YouTube has released a large scale dataset~\cite{Wang2019UGCDataset}. The initial dataset only provided raw videos, which made it difficult for quality assessment. In this paper, we built a crowd-sourcing platform to collect and cleanup subjective quality scores for YouTube UGC dataset, and analyzed the distribution of Mean Opinion Score (MOS) in various dimensions. Some fundamental question in video quality assessment are also investigated, like the correlation between full video MOS and corresponding chunk MOS, and the influence of chunk variation in quality score aggregation. View details
    Preview abstract Todays video transcoding pipelines choose transcoding parameters based on Rate-Distortion curves, which mainlyfocuses on the relative quality difference between original and transcoded videos. By investigating recentlyreleased YouTube UGC dataset, we found that people were more tolerant to the quality changes in low qualityinputs than in high quality inputs, which suggests that current transcoding framework could be further optimizedby considering input perceptual quality. An efficient machine learning based metric was proposed to detect lowquality inputs, whose bitrate can be further reduced without hurting perceptual quality. To evaluate the impacton perceptual quality, we conducted a crowd-sourcing subjective experiment, and provided a methodology toevaluate statistical significance among different treatments. The results showed that the proposed quality guidedtranscoding framework is able to reduce the average bitrate upto 5% with insignificant quality degradation. View details
    On the first JND and Break in Presence of 360-degree content: An exploratory study
    Roberto G. de A. Azevedo
    Ivan Janatra
    Pascal Frossard
    MMVE '19- Proceedings of the 11th ACM Workshop on Immersive Mixed and Virtual Environment Systems (2019)
    Preview abstract Unlike traditional planar 2D visual content, immersive 360-degree images and videos undergo particular processing steps and are intended to be consumed via head-mounted displays (HMDs). To get a deeper understanding on the perception of 360-degree visual distortions when consumed through HMDs, we perform an exploratory task-based subjective study in which we have asked subjects to define the first noticeable difference and break-in-presence points when incrementally adding specific compression artifacts. The results of our study: give insights on the range of allowed visual distortions for 360-degree content; show that the added visual distortions are more tolerable in mono than in stereoscopic 3D; and identify issues with current 360-degree objective quality metrics. View details
    Visual Distortions in 360-degree Videos
    Roberto G. de A. Azevedo
    Francesca De Simone
    Ivan Janatra
    Pascal Frossard
    IEEE Transactions on Circuits and Systems for Video Technology (2019)
    Preview abstract Omnidirectional (or 360-degree) images and videos are emergent signals being used in many areas such as robotics and virtual/augmented reality. In particular, for virtual reality applications, they allow an immersive experience in which the user can interactively navigate through a scene with three degrees of freedom, wearing a head-mounted display. Current approaches for capturing, processing, delivering, and displaying 360-degree content, however, present many open technical challenges and introduce several types of distortions in the visual signal. Some of the distortions are specific to the nature of 360-degree images and often differ from those encountered in classical visual communication frameworks. This paper provides a first comprehensive review of the most common visual distortions that alter 360-degree signals going through the different processing elements of the visual communication pipeline. While their impact on viewers’ visual perception and the immersive experience at large is still unknown –thus, it is an open research topic– this review serves the purpose of proposing a taxonomy of the visual distortions that can be encountered in 360-degree signals. Their underlying causes in the end-to-end 360-degree content distribution pipeline are identified. This taxonomy is essential as a basis for comparing different processing techniques, such as visual enhancement, encoding, and streaming strategies, and allowing the effective design of new algorithms and applications. It is also a useful resource for the design of psycho-visual studies aiming to characterize human perception of 360-degree content in interactive and immersive applications. View details
    Quantitative evaluation of omnidirectional video quality
    Chip Brown
    Rob Suderman
    Quality of Multimedia Experience (QoMEX) (2017)
    Preview abstract Omnidirectional video encoding and delivery are rapidly evolving fields, where choosing an efficient representation for storage and transmission of pixel data is critical. Given that there are a number of projections (pixel representations), a projection independent measure is needed to evaluate the merits of different options. We present a technique to evaluate projection quality by rendering virtual views and use this to evaluate three projections in common use: Equirectangular, Cubemap, and Equi-Angular Cubemap. Through evaluation on dozens of videos, our metrics rank the projection types consistently with pixel density computations and small scale user studies. View details
    Deformable block based motion estimation in omnidirectional image sequences
    Francesca De Simone
    Pascal Frossard
    IEEE 19th International Workshop on Multimedia Signal Processing (2017)
    Preview abstract This paper presents an extension of block-based motion estimation for omnidirectional videos, based on a camera and translational object motion model that accounts for the spherical geometry of the imaging system. We use this model to design a new algorithm to perform block matching in sequences of panoramic frames that are the result of the equirectangular projection. Experimental results demonstrate that significant gains can be achieved with respect to the classical exhaustive block matching algorithm (EBMA) in terms of accuracy of motion prediction. In particular, average quality improvements up to approximately 6dB in terms of Peak Signal to Noise Ratio (PSNR), 0.043 in terms of Structural SIMilarity index (SSIM), and 2dB in terms of spherical PSNR, can be achieved on the predicted frames. View details
    Geometry-driven quantization for omnidirectional image coding
    Francesca De Simone
    Pascal Frossard
    Paul Wilkins
    Anil Kokaram
    Picture Coding Symposium (PCS) (2016)
    Preview abstract In this paper we propose a method to adapt the quantization tables of typical block-based transform codecs when the input to the encoder is a panoramic image resulting from equirectangular projection of a spherical image. When the visual content is projected from the panorama to the viewport, a frequency shift is occurring. The quantization can be adapted accordingly: the quantization step sizes that would be optimal to quantize the transform coefficients of the viewport image block, can be used to quantize the coefficients of the panoramic block. As a proof of concept, the proposed quantization strategy has been used in JPEG compression. Results show that a rate reduction up to 2.99% can be achieved for the same perceptual quality of the spherical signal with respect to a standard quantization. View details
    Temporal Synchronization of Multiple Audio Signals
    Sasi Inguva
    Andy Crawford
    Hugh Denman
    Anil Kokaram
    Proceedings of the International Conference on Signal Processing (ICASSP), Florence, Italy (2014)
    Preview abstract Given the proliferation of consumer media recording devices, events often give rise to a large number of recordings. These recordings are taken from different spatial positions and do not have reliable timestamp information. In this paper, we present two robust graph-based approaches for synchronizing multiple audio signals. The graphs are constructed atop the over-determined system resulting from pairwise signal comparison using cross-correlation of audio features. The first approach uses a Minimum Spanning Tree (MST) technique, while the second uses Belief Propagation (BP) to solve the system. Both approaches can provide excellent solutions and robustness to pairwise outliers, however the MST approach is much less complex than BP. In addition, an experimental comparison of audio features-based synchronization shows that spectral flatness outperforms the zero-crossing rate and signal energy. View details
    IntellEditS: Intelligent Learning-Based Editor of Segmentations
    Adam P. Harrison
    Michal Sofka
    MICCAI (3) (2013), pp. 235-242
    Rapid Multi-organ Segmentation Using Context Integration and Discriminative Models
    Nathan Lay
    Jingdan Zhang
    Shaohua Kevin Zhou
    IPMI (2013), pp. 450-462
    Precise Segmentation of Multiple Organs in CT Volumes Using Learning-Based Approach and Information Theory
    Chao Lu
    Yefeng Zheng
    Jingdan Zhang
    Timo Kohlberger
    Christian Tietjen
    Thomas Boettger
    James S. Duncan
    Shaohua Kevin Zhou
    MICCAI (2) (2012), pp. 462-469
    Automatic Multi-organ Segmentation Using Learning-Based Segmentation and Level Set Optimization
    Timo Kohlberger
    Michal Sofka
    Jingdan Zhang
    Jens Wetzl
    Jens N. Kaftan
    Jérôme Declerck
    Shaohua Kevin Zhou
    MICCAI (3) (2011), pp. 338-345
    Basis constrained 3D scene flow on a dynamic proxy
    Dana Cobzas
    Martin Jägersand
    ICCV (2011), pp. 1967-1974
    Multi-stage Learning for Robust Lung Segmentation in Challenging CT Volumes
    Michal Sofka
    Jens Wetzl
    Jingdan Zhang
    Timo Kohlberger
    Jens N. Kaftan
    Jérôme Declerck
    Shaohua Kevin Zhou
    MICCAI (3) (2011), pp. 667-674
    Integrated Detection Network (IDN) for pose and boundary estimation in medical images
    Michal Sofka
    Kristof Ralovich
    Jingdan Zhang
    Shaohua Kevin Zhou
    ISBI (2011), pp. 294-299
    Performance evaluation of monocular predictive display
    Adam Rachmielowski
    Martin Jägersand
    ICRA (2010), pp. 5309-5314
    Predictive display for mobile manipulators in unknown environments using online vision-based monocular modeling and localization
    David Lovi
    Alejandro Hernandez Herdocia
    Adam Rachmielowski
    Martin Jägersand
    Dana Cobzas
    IROS (2010), pp. 5792-5798
    An interactive graph cut method for brain tumor segmentation
    Dana Cobzas
    Martin Jägersand
    Albert Murtha
    Tibor Kesztyues
    WACV (2009), pp. 1-7
    Realtime Visualization of Monocular Data for 3D Reconstruction
    Adam Rachmielowski
    Martin Jägersand
    Dana Cobzas
    CRV (2008), pp. 196-202
    3D Variational Brain Tumor Segmentation using a High Dimensional Feature Set
    Dana Cobzas
    Mark W. Schmidt
    Martin Jägersand
    Albert Murtha
    ICCV (2007), pp. 1-8
    A Dimension Abstraction Approach to Vectorization in Matlab
    Jonathan Levesque
    José Nelson Amaral
    CGO (2007), pp. 115-130
    Variational Shape and Reflectance Estimation Under Changing Light and Viewpoints
    Dana Cobzas
    Peter F. Sturm
    Martin Jägersand
    ECCV (1) (2006), pp. 536-549
    Object Centered Stereo: Displacement Map Estimation Using Texture and Shading
    Dana Cobzas
    Martin Jägersand
    3DPVT (2006), pp. 790-797
    Visual Tracking Using Active Appearance Models
    Martin Jägersand
    CRV (2004), pp. 2-9