Yaowu Xu

Yaowu Xu

Dr. Yaowu Xu is currently a Distinguished Software Engineer at Google, Leading Google's Open Codecs Group. Dr. Xu’s group is responsible for developing the core technology that enables the delivery of Video, Image, Audio, and 3D Reality over the internet, powering a broad range of products and services at Google and the industry, such as YouTube, Meet, Photos, Image Search, Ads, and AR Shopping. World-class experts in his group have been the driving force of many open-source media compression projects such as VP9, AV1, AV2, WebP, AVIF, Lyra, and Draco. Besides leading engineering teams and projects, Dr. Xu was the executive sponsor of Chrome & Photos' mentoring program, a faculty member of Google's Manager Development program that trains Google's people managers, and a volunteer mentor to Google’s talents at various career stages. Prior to joining Google, Dr. Xu was the vice president of codec development at On2 Technologies. Dr. Xu holds a Ph.D. degree in Nuclear Engineering from Tsinghua University in Beijing, China, and earned his Ph.D. degree in Electrical and Computer Engineering from the University of Rochester in 2003. He has been granted over two hundred patents.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract This paper proposes a novel bi-directional motion compensation framework that extracts existing motion information associated with the reference frames and interpolates an additional reference frame candidate that is co-located with the current frame. The approach generates a dense motion field by performing optical flow estimation, so as to capture complex motion between the reference frames without recourse to additional side information. The estimated optical flow is then complemented by transmission of offset motion vectors to correct for possible deviation from the linearity assumption in the interpolation. Various optimization schemes specifically tailored to the video coding framework are presented to further improve the performance. To accommodate applications where decoder complexity is a cardinal concern, a block-constrained speed-up algorithm is also proposed. Experimental results show that the main approach and optimization methods yield significant coding gains across a diverse set of video sequences. Further experiments focus on the trade-off between performance and complexity, and demonstrate that the proposed speed-up algorithm offers complexity reduction by a large factor while maintaining most of the performance gains. View details
    Preview abstract Selecting among multiple transform kernels to code prediction residuals are widely used for better compression efficiency. Conventionally, the encoder performs trials of each transform to estimate the rate-distortion (R-D) cost. However such an exhaustive approach suffers from a significant increase of complexity due to the excessive trials. In this paper, a novel rate estimation approach is proposed to by-pass the entropy coding process for each transform type using the conditional Laplace distribution model. The proposed method estimates the Laplace distribution parameter by the context inferred by the quantization level and finds the expected rate of the coefficient for transform type selection. Furthermore, a greedy search algorithm for separable transforms is also presented to further accelerate the process. Experiment results show that transform type selection using the proposed rate estimation method achieves high accuracy at lower complexity. View details
    AN OVERVIEW OF CORE CODING TOOLS IN THE AV1 VIDEO CODEC
    Adrian Grange
    Andrey Norkin
    Ching-Han Chiang
    Hui Su
    Jean-Marc Valin
    Luc Trudeau
    Nathan Egge
    Paul Wilkins
    Peter de Rivaz
    Sarah Parker
    Steinar Midtskogen
    Thomas Davies
    Zoe Liu
    The Picture Coding Symposium (PCS) (2018)
    Preview abstract AV1 is an emerging open-source and royalty-free video compression format, which is jointly developed and finalized in early 2018 by the Alliance for Open Media (AOMedia) industry consortium. The main goal of AV1 development is to achieve substantial compression gain over state-of-the-art codecs while maintaining practical decoding complexity and hardware feasibility. This paper provides a brief technical overview of key coding techniques in AV1 along with preliminary compression performance comparison against VP9 and HEVC. View details
    Novel inter and intra prediction tools under consideration for the emerging AV1 video codec
    Sarah Parker
    Hui Su
    Angie Chiang
    Zoe Liu
    Chen Wang
    Emil Keyder
    SPIE Optical Engineering + Applications, 10396 (2017), 10396 - 10396 - 13
    Preview abstract Google started the WebM Project in 2010 to develop open source, royalty-free video codecs designed specifically for media on the Web. The second generation codec released by the WebM project, VP9, is currently served by YouTube, and enjoys billions of views per day. Realizing the need for even greater compression efficiency to cope with the growing demand for video on the web, the WebM team embarked on an ambitious project to develop a next edition codec AV1, in a consortium of major tech companies called the Alliance for Open Media, that achieves at least a generational improvement in coding efficiency over VP9. In this paper, we focus primarily on new tools in AV1 that improve the prediction of pixel blocks before transforms, quantization and entropy coding are invoked. Specifically, we describe tools and coding modes that improve intra, inter and combined inter-intra prediction. Results are presented on standard test sets. View details
    Preview abstract Video codec exploits temporal redundancy of video signal, in the form of motion compensated prediction, to achieve superior compression performance. The coding of motion vectors takes a large portion of the total rate cost. Prior research utilizes the spatial and temporal correlations of the motion field to improve the coding efficiency of the motion information. It typically constructs a candidate pool composed of a fixed number of reference motion vectors and allows the codec to select and reuse the one that best approximates the motion activity of the current block. This largely disconnects the entropy coding process from the true boundary conditions, since it is masked by the fix-length candidate list, and hence could potentially cause sub-optimal coding performance. An alternative motion vector referencing scheme is proposed in this work to fully accommodate the dynamic nature of the boundary conditions for compression efficiency. It adaptively extends or shortens the candidate list according to the actual number of available reference motion vectors. The associated probability model accounts for the likelihood that an individual motion vector candidate is used. A complementary motion vector candidate ranking system is also presented here. It is experimentally shown that the proposed scheme achieves considerable compression performance gains across all the test sets. View details
    Preview abstract Screen content videos that typically contain computer generated texts and graphics are getting more demanding in nowadays online video service. They involve a great amount of circumstances that are not commonly seen in natural videos, including sharp edge transition and repetitive pattern, which make their statistical characteristics distinct from those of natural videos. This makes it questionable about the efficacy of the conventional discrete cosine transform (DCT), which builds on the Gauss-Markov model assumption that leads to a base-band signal, on coding the computer-generated graphics. This work exploits a class of staircase transforms. Unlike the DCT whose bases are samplings of sinusoidal functions, the staircase transforms have their bases sampled from staircase functions, which naturally better approximate the sharp transitions often encountered in the context of screen content. As an alternative transform kernel, the staircase transform is integrated into a hybrid transform coding scheme, in conjunction with DCT. It is experimentally shown that the proposed approach provides an average of 2.9% compression performance gains in terms of BD-rate reduction. A perceptual comparison further demonstrates that the use of staircase transform achieves substantial reduction in ringing artifact due to the Gibbs phenomenon. View details
    An estimation-theoretic approach to video denoising
    Timothy Kopp
    2015 IEEE International Conference on Image Processing, IEEE, pp. 4273-4277
    Preview abstract A novel denoising scheme is proposed to fully exploit the spatio-temporal correlations of the video signal for efficient enhancement. Unlike conventional pixel domain approaches that directly connect motion compensated reference pixels and spatially neighboring pixels to build statistical models for noise filtering, this work first removes spatial correlations by applying transformations to both pixel blocks and performs estimation in the frequency domain. It is premised on the realization that the precise nature of temporal dependencies, which is entirely masked in the pixel domain by the statistics of the dominant low frequency components, emerges after signal decomposition and varies considerably across the spectrum. We derive an optimal non-linear estimator that accounts for both motion compensated reference and the noisy observations to resemble the original video signal per transform coefficient. It departs from other transform domain approaches that employ linear filters over a sizable reference set to reduce the uncertainty due to the random noise term. Instead it jointly exploits this precise statistical property appeared in the transform domain and the noise probability model in an estimation-theoretic framework that works on a compact support region. Experimental results provide evidence for substantial denoising performance improvement. View details
    Preview abstract The template matching prediction is an established approach to intra-frame coding that makes use of previously coded pixels in the same frame for reference. It compares the previously reconstructed upper and left boundaries in searching from the reference area the best matched block for prediction, and hence eliminates the need of sending additional information to reproduce the same prediction at decoder. In viewing the image signal as an auto-regressive model, this work is premised on the fact that pixels closer to the known block boundary are better predicted than those far apart. It significantly extends the scope of the template matching approach, which is typically followed by a conventional discrete cosine transform (DCT) for the prediction residuals, by employing an asymmetric discrete sine transform (ADST), whose basis functions vanish at the prediction boundary and reach maximum magnitude at far end, to fully exploit statistics of the residual signals. It was experimentally shown that the proposed scheme provides substantial coding performance gains on top of the conventional template matching method over the baseline. View details
    Preview abstract The hybrid transform coding scheme that alternates amongst the asymmetric discrete sine transform (ADST) and the discrete cosine transform (DCT) depending on the boundary prediction conditions, is an efficient tool for video and image compression. It optimally exploits the statistical characteristics of prediction residual, thereby achieving significant coding performance gains over the conventional DCT-based approach. A practical concern lies in the intrinsic conflict between transform kernels of ADST and DCT, which prevents a butterfly structured implementation for parallel computing. Hence the hybrid transform coding scheme has to rely on matrix multiplication, which presents a speed-up barrier due to under-utilization of the hardware, especially for larger block sizes. In this work, we devise a novel ADST-like transform whose kernel is consistent with that of DCT, thereby enabling butterfly structured computation flow, while largely retaining the performance advantages of hybrid transform coding scheme in terms of compression efficiency. A prototype implementation of the proposed butterfly structured hybrid transform coding scheme is available in the VP9 codec repository. View details