Nick Johnston
Research Areas
Authored Publications
Sort By
Preview abstract
The rate-distortion performance of neural image compression models has exceeded the state-of-the-art of non-learned codecs, but neural codecs are still far from widespread deployment and adoption. The largest obstacle is having efficient models that are feasible on a wide variety of consumer hardware. Comparative research and evaluation is difficult because of the lack of standard benchmarking platforms and by variations in hardware architectures and test environments.Through our rate-distortion-computation (RDC) study we demonstrate that neither floating-point operations (FLOPs) nor runtime are sufficient on their own to accurately rank neural compression methods. We also explore the RDC frontier, which leads to a family of model architectures with the best empirical trade-off between computational requirements and RD performance. Finally, we identify a novel neural compression architecture that yields state-of-the-art RD performance with rate savings of 23.1% over BPG (7.0% overVTM and 3.0% over ELIC) without requiring significantly more FLOPs than other learning-based codecs
View details
Neural Video Compression using GANs for Detail Synthesis and Propagation
European Conference on Computer Vision (2022)
Preview abstract
We present the first neural video compression method based on generative adversarial networks (GANs). Our approach significantly outperforms previous neural and non-neural video compression methods in a user study, setting a new state-of-the-art in visual quality for neural methods. We show that the GAN loss is crucial to obtain this high visual quality. Two components make the GAN loss effective: we i) synthesize detail by conditioning the generator on a latent extracted from the warped previous reconstruction to then ii) propagate this detail with high-quality flow. We find that user studies are required to compare methods, i.e., none of our quantitative metrics were able to predict all studies. We present the network design choices in detail, and ablate them with user studies.
View details
Preview abstract
Compression is essential to storing and transmitting medical videos, but the effect of compression artifacts on downstream medical tasks is often ignored. Furthermore, systems in practice rely on standard video codecs, which naively allocate bits evenly between medically interesting and uninteresting frames and parts of frames. In this work, we present an empirical study of some deficiencies of classical codecs on gastroenterology videos, and motivate our ongoing work to train a learned compression model for colonoscopy videos, which we call ``GastroEnterology Aware Compression" (GEAC). We show that H264 and HEVC, two of the most common classical codecs, perform worse on the most medically-relevant frames. We also show that polyp detector performance degrades rapidly as compression increases, and explain why a learned compressor would degrade more gracefully. Many of our proposed techniques generalize to medical video domains beyond gastroenterology.
View details
Nonlinear Transform Coding
Philip A. Chou
Sung Jin Hwang
IEEE Trans. on Special Topics in Signal Processing, 15 (2021) (to appear)
Preview abstract
We review a class of methods that can be collected under the name nonlinear transform coding (NTC), which over the past few years have become competitive with the best linear transform codecs for images, and have superseded them in terms of rate–distortion performance under established perceptual quality metrics such as MS-SSIM. We assess the empirical rate–distortion performance of NTC with the help of simple example sources, for which the optimal performance of a vector quantizer is easier to estimate than with natural data sources. To this end, we introduce a novel variant of entropy-constrained vector quantization. We provide an analysis of various forms of stochastic optimization techniques for NTC models; review architectures of transforms based on artificial neural networks, as well as learned entropy models; and provide a direct comparison of a number of methods to parameterize the rate–distortion trade-off of nonlinear transforms, introducing a simplified one.
View details
End-to-end Learning of Compressible Features
Abhinav Shrivastava
2020 IEEE Int. Conf. on Image Processing (ICIP)
Preview abstract
Pre-trained convolutional neural networks (CNNs) are very powerful as an off the shelf feature generator and have been shown to perform very well on a variety of tasks. Unfortunately, the generated features are high dimensional and expensive to store: potentially hundreds of thousands of floats per example when processing videos. Traditional entropy based lossless compression methods are of little help as they do not yield desired level of compression while general purpose lossy alternatives (e.g. dimensionality reduction techniques) are sub-optimal as they end up losing important information. We propose a learned method that jointly optimizes for compressibility along with the original objective for learning the features. The plug-in nature of our method makes it straight-forward to integrate with any target objective and trade-off against compressibility. We present results on multiple benchmarks and demonstrate that features learned by our method maintain their informativeness while being order of magnitude more compressible.
View details
Scale-Space Flow for End-to-End Optimized Video Compression
Sung Jin Hwang
2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR)
Preview abstract
Despite considerable progress on end-to-end optimized deep networks for image
compression, video coding remains a challenging task. Recently proposed
methods for learned video compression use optical flow and bilinear warping
for motion compensation and show competitive rate-distortion performance
relative to hand-engineered codecs like H.264 and HEVC. However, these
learning-based methods rely on complex architectures and training schemes
including the use of pre-trained optical flow networks, sequential training of
sub-networks, adaptive rate control, and buffering intermediate
reconstructions to disk during training. In this paper, we show that a
generalized warping operator that better handles common failure cases,
e.g. disocclusions and fast motion, can provide competitive compression
results with a greatly simplified model and training procedure. Specifically,
we propose scale-space flow, an intuitive generalization of optical
flow that adds a scale parameter to allow the network to better model
uncertainty. Our experiments show that a low-latency video compression model
(no B-frames) using scale-space flow for motion compensation can outperform
analogous state-of-the art learned video compression models while being
trained using a much simpler procedure and without any pre-trained optical
flow networks.
View details
Preview abstract
Image compression using neural networks have reached or exceeded non-neural methods (such as JPEG, WebP, BPG). While these networks are state of the art in rate-distortion performance, computational feasibility of these models remains a challenge. Our work provides three novel contributions. We propose a run-time improvement to the Generalized Divisive Normalization formulation, a regularization technique targeted to optimizing neural image decoders, and an analysis of
the trade offs in 207 architecture variations across multiple distortion loss functions to recommend an architecture that is twice as fast while maintaining state-of-the-art image compression performance.
View details
Neural Image Decompression: Learning to Render Better Image Previews
Michele Covell
2019 IEEE International Conference on Image Processing, IEEE
Preview abstract
A rapidly increasing portion of Internet traffic is dominated by requests from mobile devices with limited- and metered-bandwidth constraints. To satisfy these requests, it has become standard practice for websites to transmit small and extremely compressed image previews as part of the initial page-load process. Recent work, based on an adaptive triangulation of the target image, has shown the ability to generate thumbnails of full images at extreme compression rates: 200 bytes or less with impressive gains (in terms of PSNR and SSIM) over both JPEG and WebP standards. However, qualitative assessments and preservation of semantic content can be less favorable. We present a novel method to significantly improve the reconstruction quality of the original image with no changes to the encoded information. Our neural-based decoding not only achieves higher PSNR and SSIM scores than the original methods, but also yields a substantial increase in semantic-level content preservation. In addition, by keeping the same encoding stream, our solution is completely inter-operable with the original decoder. The end result is suitable for a range of small-device deployments, as it involves only a single forward-pass through a small, scalable network.
View details
Preview abstract
We consider the problem of using variational latent-variable models for data compression. For such models to produce a compressed binary sequence, which is the universal data representation in a digital world, the latent representation needs to be subjected to entropy coding. Range coding as an entropy coding technique is optimal, but it can fail catastrophically if the computation of the prior differs even slightly between the sending and the receiving side. Unfortunately, this is a common scenario when floating point math is used and the sender and receiver operate on different hardware or software platforms, as numerical round-off is often platform dependent. We propose using integer networks as a universal solution to this problem, and demonstrate that they enable reliable cross-platform encoding and decoding of images using variational models.
View details