Jump to Content
Balu Adsumilli

Balu Adsumilli

Dr. Balu Adsumilli is currently the Head of Media Algorithms group at YouTube/Google, where he and his team research and develop algorithms to transform the uploaded videos to formats played across all your devices. Over the past years, he was instrumental in building and scaling technologies in the areas of video processing, computer vision, video compression and quality, which garnered Two Technology and Engineering Emmy awards. Prior to YouTube, he was a Sr. Manager of Advanced Technology at GoPro, where he led the image capture architecture and software teams, and developed their ProTune mode in collaboration with ACES and Technicolor. This paved the way for GoPro cameras capturing Industry neutral formats, and enabled their widespread applicability in the movie and television industry.

Dr. Adsumilli serves on the board of the Television Academy, on the Visual Effects Society board, on the NATAS technical committee, on the IEEE Multimedia Signal Processing (MMSP) Technical Committee, and on ACM Mile High Video Steering Committee. He has co-authored more than 80 papers and 80 granted patents with many more pending. He is on TPCs and organizing committees for various conferences and organized numerous workshops. He is a senior member of IEEE, and an active member of ACM, SMPTE, VES, and SPIE. He received his PhD from the University of California Santa Barbara, and masters from University of Wisconsin Madison.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Preview abstract Video quality assessment for User Generated Content (UGC) is an important topic in both industry and academia. Most existing methods only focus on one aspect of the perceptual quality assessment, such as technical quality or compression artifacts. In this paper, we create a large scale dataset to comprehensively investigate characteristics of generic UGC video quality. Besides the subjective ratings and content labels of the dataset, we also propose a DNN-based framework to thoroughly analyze importance of content, technical quality, and compression level in perceptual quality. Our model is able to provide quality scores as well as human-friendly quality indicators, to bridge the gap between low level video signals to human perceptual quality. Experimental results show that our model achieves state-of-the-art correlation with Mean Opinion Scores (MOS). View details
    Preview abstract Todays video transcoding pipelines choose transcoding parameters based on Rate-Distortion curves, which mainlyfocuses on the relative quality difference between original and transcoded videos. By investigating recentlyreleased YouTube UGC dataset, we found that people were more tolerant to the quality changes in low qualityinputs than in high quality inputs, which suggests that current transcoding framework could be further optimizedby considering input perceptual quality. An efficient machine learning based metric was proposed to detect lowquality inputs, whose bitrate can be further reduced without hurting perceptual quality. To evaluate the impacton perceptual quality, we conducted a crowd-sourcing subjective experiment, and provided a methodology toevaluate statistical significance among different treatments. The results showed that the proposed quality guidedtranscoding framework is able to reduce the average bitrate upto 5% with insignificant quality degradation. View details
    Preview abstract User Generated Contents~(UGC) received a lot of interests in academia and industry recently. To facilitate compression-related research on UGC, YouTube has released a large scale dataset~\cite{Wang2019UGCDataset}. The initial dataset only provided raw videos, which made it difficult for quality assessment. In this paper, we built a crowd-sourcing platform to collect and cleanup subjective quality scores for YouTube UGC dataset, and analyzed the distribution of Mean Opinion Score (MOS) in various dimensions. Some fundamental question in video quality assessment are also investigated, like the correlation between full video MOS and corresponding chunk MOS, and the influence of chunk variation in quality score aggregation. View details
    On the first JND and Break in Presence of 360-degree content: An exploratory study
    Roberto G. de A. Azevedo
    Ivan Janatra
    Pascal Frossard
    MMVE '19- Proceedings of the 11th ACM Workshop on Immersive Mixed and Virtual Environment Systems (2019)
    Preview abstract Unlike traditional planar 2D visual content, immersive 360-degree images and videos undergo particular processing steps and are intended to be consumed via head-mounted displays (HMDs). To get a deeper understanding on the perception of 360-degree visual distortions when consumed through HMDs, we perform an exploratory task-based subjective study in which we have asked subjects to define the first noticeable difference and break-in-presence points when incrementally adding specific compression artifacts. The results of our study: give insights on the range of allowed visual distortions for 360-degree content; show that the added visual distortions are more tolerable in mono than in stereoscopic 3D; and identify issues with current 360-degree objective quality metrics. View details
    Visual Distortions in 360-degree Videos
    Roberto G. de A. Azevedo
    Francesca De Simone
    Ivan Janatra
    Pascal Frossard
    IEEE Transactions on Circuits and Systems for Video Technology (2019)
    Preview abstract Omnidirectional (or 360-degree) images and videos are emergent signals being used in many areas such as robotics and virtual/augmented reality. In particular, for virtual reality applications, they allow an immersive experience in which the user can interactively navigate through a scene with three degrees of freedom, wearing a head-mounted display. Current approaches for capturing, processing, delivering, and displaying 360-degree content, however, present many open technical challenges and introduce several types of distortions in the visual signal. Some of the distortions are specific to the nature of 360-degree images and often differ from those encountered in classical visual communication frameworks. This paper provides a first comprehensive review of the most common visual distortions that alter 360-degree signals going through the different processing elements of the visual communication pipeline. While their impact on viewers’ visual perception and the immersive experience at large is still unknown –thus, it is an open research topic– this review serves the purpose of proposing a taxonomy of the visual distortions that can be encountered in 360-degree signals. Their underlying causes in the end-to-end 360-degree content distribution pipeline are identified. This taxonomy is essential as a basis for comparing different processing techniques, such as visual enhancement, encoding, and streaming strategies, and allowing the effective design of new algorithms and applications. It is also a useful resource for the design of psycho-visual studies aiming to characterize human perception of 360-degree content in interactive and immersive applications. View details
    Sasi Inguva
    2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP)
    Preview abstract User Generated Contents (UGCs) become more and more popular in today’s video sharing applications. However, there are few public UGC data available for video compression and quality assessment research. In this paper, a large scale UGC dataset is introduced, which is sampled from millions of YouTube videos and covers most popular categories like Gaming, Sports, and HDR. Besides a novel sampling method based on features extracted from transcoding, challenges for UGC compression and quality evaluation are also addressed. We also released three no reference quality metrics for the UGC dataset, which overcome certain shortcomings of traditional reference metrics on UGCs. View details
    Deformable block based motion estimation in omnidirectional image sequences
    Francesca De Simone
    Pascal Frossard
    IEEE 19th International Workshop on Multimedia Signal Processing (2017)
    Preview abstract This paper presents an extension of block-based motion estimation for omnidirectional videos, based on a camera and translational object motion model that accounts for the spherical geometry of the imaging system. We use this model to design a new algorithm to perform block matching in sequences of panoramic frames that are the result of the equirectangular projection. Experimental results demonstrate that significant gains can be achieved with respect to the classical exhaustive block matching algorithm (EBMA) in terms of accuracy of motion prediction. In particular, average quality improvements up to approximately 6dB in terms of Peak Signal to Noise Ratio (PSNR), 0.043 in terms of Structural SIMilarity index (SSIM), and 2dB in terms of spherical PSNR, can be achieved on the predicted frames. View details
    No Results Found