Yilin Wang

Yilin Wang

Yilin Wang received the B.S. and M.S. degrees in Computer Science from Nanjing University, China, and Ph.D degree from The University of North Carolina at Chapel Hill. He joined YouTube Media Algorithms team in 2014. His research interests include video processing, visual quality assessment, video compression, machine learning and optimization.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    MUSIQ: Multi-scale Image Quality Transformer
    Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
    Preview abstract Image quality assessment (IQA) is an important research topic for understanding and improving visual experience. The current state-of-the-art IQA methods are based on convolutional neural networks (CNNs). The performance of CNN-based models is often compromised by the fixed shape constraint in batch training. To accommodate this, the input images are usually resized and cropped to a fixed shape, causing image quality degradation. To address this, we design a multi-scale image quality Transformer (MUSIQ) to process native resolution images with varying sizes and aspect ratios. With a multi-scale image representation, our proposed method can capture image quality at different granularities. Furthermore, a novel hash-based 2D spatial embedding and a scale embedding is proposed to support the positional embedding in the multi-scale representation. Experimental results verify that our method can achieve state-of-the-art performance on multiple large scale IQA datasets such as PaQ-2-PiQ, SPAQ, and KonIQ-10k. View details
    Preview abstract Video quality assessment for User Generated Content (UGC) is an important topic in both industry and academia. Most existing methods only focus on one aspect of the perceptual quality assessment, such as technical quality or compression artifacts. In this paper, we create a large scale dataset to comprehensively investigate characteristics of generic UGC video quality. Besides the subjective ratings and content labels of the dataset, we also propose a DNN-based framework to thoroughly analyze importance of content, technical quality, and compression level in perceptual quality. Our model is able to provide quality scores as well as human-friendly quality indicators, to bridge the gap between low level video signals to human perceptual quality. Experimental results show that our model achieves state-of-the-art correlation with Mean Opinion Scores (MOS). View details
    Preview abstract Todays video transcoding pipelines choose transcoding parameters based on Rate-Distortion curves, which mainlyfocuses on the relative quality difference between original and transcoded videos. By investigating recentlyreleased YouTube UGC dataset, we found that people were more tolerant to the quality changes in low qualityinputs than in high quality inputs, which suggests that current transcoding framework could be further optimizedby considering input perceptual quality. An efficient machine learning based metric was proposed to detect lowquality inputs, whose bitrate can be further reduced without hurting perceptual quality. To evaluate the impacton perceptual quality, we conducted a crowd-sourcing subjective experiment, and provided a methodology toevaluate statistical significance among different treatments. The results showed that the proposed quality guidedtranscoding framework is able to reduce the average bitrate upto 5% with insignificant quality degradation. View details
    GIFnets: An end-to-end neural network based GIF encoding framework
    Innfarn Yoo
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle (2020)
    Preview abstract Graphics Interchange Format (GIF) is a widely used image file format. Due to the limited number of palette colors, GIF encoding often introduces color banding artifacts. Traditionally, dithering is applied to reduce color banding, but introducing dotted-pattern artifacts. To reduce artifacts and provide a better and more efficient GIF encoding, we introduce a differentiable GIF encoding pipeline, which includes three novel neural networks: PaletteNet, DitherNet, and BandingNet. Each of these three networks provides an important functionality within the GIF encoding pipeline. PaletteNet predicts a near-optimal color palette given an input image. DitherNet manipulates the input image to reduce color banding artifacts and provides an alternative to traditional dithering. Finally, BandingNet is designed to detect color banding, and provides a new perceptual loss specifically for GIF images. As far as we know, this is the first fully differentiable GIF encoding pipeline based on deep neural networks and compatible with existing GIF decoders. User study shows that our algorithm is better than Floyd-Steinberg based GIF encoding. View details
    Preview abstract User Generated Contents~(UGC) received a lot of interests in academia and industry recently. To facilitate compression-related research on UGC, YouTube has released a large scale dataset~\cite{Wang2019UGCDataset}. The initial dataset only provided raw videos, which made it difficult for quality assessment. In this paper, we built a crowd-sourcing platform to collect and cleanup subjective quality scores for YouTube UGC dataset, and analyzed the distribution of Mean Opinion Score (MOS) in various dimensions. Some fundamental question in video quality assessment are also investigated, like the correlation between full video MOS and corresponding chunk MOS, and the influence of chunk variation in quality score aggregation. View details
    YOUTUBE UGC DATASET FOR VIDEO COMPRESSION RESEARCH
    Sasi Inguva
    2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP)
    Preview abstract User Generated Contents (UGCs) become more and more popular in today’s video sharing applications. However, there are few public UGC data available for video compression and quality assessment research. In this paper, a large scale UGC dataset is introduced, which is sampled from millions of YouTube videos and covers most popular categories like Gaming, Sports, and HDR. Besides a novel sampling method based on features extracted from transcoding, challenges for UGC compression and quality evaluation are also addressed. We also released three no reference quality metrics for the UGC dataset, which overcome certain shortcomings of traditional reference metrics on UGCs. View details
    A Perceptual Visibility Metric for Banding Artifacts
    Sang-Uok Kum
    Chao Chen
    Anil Kokaram
    IEEE International Conference on Image Processing (2016)
    Preview abstract Banding is a common video artifact caused by compressing low texture regions with coarse quantization. Relatively few previous attempts exist to address banding and none incorporate subjective testing for calibrating the measurement. In this paper, we propose a novel metric that incorporates both edge length and contrast across the edge to measure video banding. We further introduce both reference and non-reference metrics. Our results demonstrate that the new metrics have a very high correlation with subjective assessment and certainly outperforms PSNR, SSIM, and VQM. View details