Scale-Space Flow for End-to-End Optimized Video Compression

Johannes Ballé
Sung Jin Hwang
2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR)

Abstract

Despite considerable progress on end-to-end optimized deep networks for image
compression, video coding remains a challenging task. Recently proposed
methods for learned video compression use optical flow and bilinear warping
for motion compensation and show competitive rate-distortion performance
relative to hand-engineered codecs like H.264 and HEVC. However, these
learning-based methods rely on complex architectures and training schemes
including the use of pre-trained optical flow networks, sequential training of
sub-networks, adaptive rate control, and buffering intermediate
reconstructions to disk during training. In this paper, we show that a
generalized warping operator that better handles common failure cases,
e.g. disocclusions and fast motion, can provide competitive compression
results with a greatly simplified model and training procedure. Specifically,
we propose scale-space flow, an intuitive generalization of optical
flow that adds a scale parameter to allow the network to better model
uncertainty. Our experiments show that a low-latency video compression model
(no B-frames) using scale-space flow for motion compensation can outperform
analogous state-of-the art learned video compression models while being
trained using a much simpler procedure and without any pre-trained optical
flow networks.