Sandwiched Video Compression: An Efficient Learned Video Compression Approach
Abstract
We propose sandwiched video compression – a video compressionframework that wraps neural networks around a standard video codec.The framework consists of a neural pre-processor, a neural post-processor, and a standard video codec between them, trained jointlyto optimize a rate-distortion loss function. Training such a frameworkend-to-end requires a differentiable proxy for the standard videocodec, which is significantly more challenging than designing imagecodec proxies due to temporal processing such as motion prediction,inter/intra mode decisions, and in-loop filtering. In this work, wepropose a computationally efficient way of approximating a videocodec and demonstrate that the neural codes generated by the neuralpre-processor can be compressed to a better rate-distortion point thanthe original frames in the input video. More precisely, sandwichedHEVC YUV 4:4:4 in low-resolution mode and sandwiched HEVCYUV 4:0:0 show around 6.5 dB and 8 dB improvements over thestandard HEVC in the same mode and format, respectively. Moreover,when optimized for and tested with a perceptual similarity metric,Learned Perceptual Image Patch Similarity (LPIPS), we observe30%to40%improvement over the standard HEVC YUV 4:4:4, dependingon the rate.