Jump to Content
Johannes Ballé

Johannes Ballé

I'm a Research Scientist at Google. My current work focuses on lossy image compression, rate–distortion optimization and models of visual perception. I defended my master's and doctoral theses on signal processing and image compression at RWTH Aachen University in 2007 and 2012, respectively, working with Jens-Rainer Ohm. This was followed by a brief collaboration with Javier Portilla at CSIC in Madrid, Spain, and a postdoctoral fellowship at New York University’s Center for Neural Science with Eero P. Simoncelli. There, I studied the relationship between perception and image statistics, and pioneered the use of variational Bayesian models and deep learning for end-to-end optimized image compression. I joined Google in early 2017 to continue working in this line of research. I've served as a reviewer for top-tier publications in both machine learning and image processing, such as NeurIPS, ICLR, ICML, Picture Coding Symposium and several IEEE Transactions. I've been a co-organizer of the annual Workshop and Challenge on Learned Image Compression (CLIC) since 2018. A full list of my publications is available on Google Scholar.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract We present the first neural video compression method based on generative adversarial networks (GANs). Our approach significantly outperforms previous neural and non-neural video compression methods in a user study, setting a new state-of-the-art in visual quality for neural methods. We show that the GAN loss is crucial to obtain this high visual quality. Two components make the GAN loss effective: we i) synthesize detail by conditioning the generator on a latent extracted from the warped previous reconstruction to then ii) propagate this detail with high-quality flow. We find that user studies are required to compare methods, i.e., none of our quantitative metrics were able to predict all studies. We present the network design choices in detail, and ablate them with user studies. View details
    Nonlinear Transform Coding
    Philip A. Chou
    Sung Jin Hwang
    IEEE Trans. on Special Topics in Signal Processing, vol. 15 (2021) (to appear)
    Preview abstract We review a class of methods that can be collected under the name nonlinear transform coding (NTC), which over the past few years have become competitive with the best linear transform codecs for images, and have superseded them in terms of rate–distortion performance under established perceptual quality metrics such as MS-SSIM. We assess the empirical rate–distortion performance of NTC with the help of simple example sources, for which the optimal performance of a vector quantizer is easier to estimate than with natural data sources. To this end, we introduce a novel variant of entropy-constrained vector quantization. We provide an analysis of various forms of stochastic optimization techniques for NTC models; review architectures of transforms based on artificial neural networks, as well as learned entropy models; and provide a direct comparison of a number of methods to parameterize the rate–distortion trade-off of nonlinear transforms, introducing a simplified one. View details
    Preview abstract Connectomic reconstruction of neural circuits relies on nanometer resolution microscopy which produces on the order of a petabyte of imagery for each cubic millimeter of brain tissue. The cost of storing such data is a significant barrier to broadening the use of connectomic approaches and scaling to even larger volumes. We present an image compression approach that uses machine learning-based denoising and standard image codecs to compress raw electron microscopy imagery of neuropil up to 17-fold with negligible loss of reconstruction accuracy. View details
    Neural Networks Optimally Compress the Sawbridge
    Aaron B. Wagner
    2021 Data Compression Conf. (DCC) (to appear)
    Preview abstract Neural-network-based compressors have proven to be remarkably effective at compressing those sources, such as images, that are nominally high-dimensional but presumed to be concentrated on a low-dimensional manifold. We consider a continuous-time random process that models an extreme version of such a source, wherein the realizations fall along a one-dimensional "curve" in function space that has infinite-dimensional linear span. We precisely characterize the optimal entropy-distortion tradeoff for this source and show numerically that it achieved by neural-network-based compressors trained with stochastic gradient descent. In contrast, we show both analytically and experimentally that classical compressors based on the Karhunen-Loève transform are highly suboptimal at high rates. View details
    Preview abstract Pre-trained convolutional neural networks (CNNs) are very powerful as an off the shelf feature generator and have been shown to perform very well on a variety of tasks. Unfortunately, the generated features are high dimensional and expensive to store: potentially hundreds of thousands of floats per example when processing videos. Traditional entropy based lossless compression methods are of little help as they do not yield desired level of compression while general purpose lossy alternatives (e.g. dimensionality reduction techniques) are sub-optimal as they end up losing important information. We propose a learned method that jointly optimizes for compressibility along with the original objective for learning the features. The plug-in nature of our method makes it straight-forward to integrate with any target objective and trade-off against compressibility. We present results on multiple benchmarks and demonstrate that features learned by our method maintain their informativeness while being order of magnitude more compressible. View details
    Scalable Model Compression by Entropy Penalized Reparameterization
    Deniz Oktay
    Abhinav Shrivastava
    8th Int. Conf. on Learning Representations (ICLR) (2020)
    Preview abstract We describe a simple and general neural network weight compression approach, in which the network parameters (weights and biases) are represented in a “latent” space, amounting to a reparameterization. This space is equipped with a learned probability model, which is used to impose an entropy penalty on the parameter representation during training, and to compress the representation using a simple arithmetic coder after training. Classification accuracy and model compressibility is maximized jointly, with the bitrate–accuracy trade-off specified by a hyperparameter. We evaluate the method on the MNIST, CIFAR-10 and ImageNet classification benchmarks using six distinct model architectures. Our results show that state-of-the-art model compression can be achieved in a scalable and general way without requiring complex procedures such as multi-stage training. View details
    Preview abstract Despite considerable progress on end-to-end optimized deep networks for image compression, video coding remains a challenging task. Recently proposed methods for learned video compression use optical flow and bilinear warping for motion compensation and show competitive rate-distortion performance relative to hand-engineered codecs like H.264 and HEVC. However, these learning-based methods rely on complex architectures and training schemes including the use of pre-trained optical flow networks, sequential training of sub-networks, adaptive rate control, and buffering intermediate reconstructions to disk during training. In this paper, we show that a generalized warping operator that better handles common failure cases, e.g. disocclusions and fast motion, can provide competitive compression results with a greatly simplified model and training procedure. Specifically, we propose scale-space flow, an intuitive generalization of optical flow that adds a scale parameter to allow the network to better model uncertainty. Our experiments show that a low-latency video compression model (no B-frames) using scale-space flow for motion compensation can outperform analogous state-of-the art learned video compression models while being trained using a much simpler procedure and without any pre-trained optical flow networks. View details
    An Unsupervised Information-Theoretic Perceptual Quality Metric
    Sangnie Bhardwaj
    Advances in Neural Information Processing Systems 33 (2020)
    Preview abstract Tractable models of human perception have proved to be challenging to build. Hand-designed models such as MS-SSIM remain popular predictors of human image quality judgements due to their simplicity and speed. Recent modern deep learning approaches can perform better, but they rely on supervised data which can be costly to gather: large sets of class labels such as ImageNet, image quality ratings, or both. We combine recent advances in information-theoretic objective functions with a computational architecture informed by the physiology of the human visual system and unsupervised training on pairs of video frames, yielding our Perceptual Information Metric (PIM). We show that PIM is competitive with supervised metrics on the recent and challenging BAPPS image quality assessment dataset and outperforms them in predicting the ranking of image compression methods in CLIC 2020. We also perform qualitative experiments using the ImageNet-C dataset, and establish that PIM is robust with respect to architectural details. View details
    Preview abstract We consider the problem of using variational latent-variable models for data compression. For such models to produce a compressed binary sequence, which is the universal data representation in a digital world, the latent representation needs to be subjected to entropy coding. Range coding as an entropy coding technique is optimal, but it can fail catastrophically if the computation of the prior differs even slightly between the sending and the receiving side. Unfortunately, this is a common scenario when floating point math is used and the sender and receiver operate on different hardware or software platforms, as numerical round-off is often platform dependent. We propose using integer networks as a universal solution to this problem, and demonstrate that they enable reliable cross-platform encoding and decoding of images using variational models. View details
    Preview abstract Image compression using neural networks have reached or exceeded non-neural methods (such as JPEG, WebP, BPG). While these networks are state of the art in rate-distortion performance, computational feasibility of these models remains a challenge. Our work provides three novel contributions. We propose a run-time improvement to the Generalized Divisive Normalization formulation, a regularization technique targeted to optimizing neural image decoders, and an analysis of the trade offs in 207 architecture variations across multiple distortion loss functions to recommend an architecture that is twice as fast while maintaining state-of-the-art image compression performance. View details
    Preview abstract Recent models for learned image compression are based on autoencoders that learn approximately invertible mappings from pixels to a quantized latent representation. The transforms are combined with an entropy model, which is a prior on the latent representation that can be used with standard arithmetic coding algorithms to generate a compressed bitstream. Recently, hierarchical entropy models were introduced as a way to exploit more structure in the latents than previous fully factorized priors, improving compression performance while maintaining end-to-end optimization. Inspired by the success of autoregressive priors in probabilistic generative models, we examine autoregressive, hierarchical, and combined priors as alternatives, weighing their costs and benefits in the context of image compression. While it is well known that autoregressive models can incur a significant computational penalty, we find that in terms of compression performance, autoregressive and hierarchical priors are complementary and can be combined to exploit the probabilistic structure in the latents better than all previous learned models. The combined model yields state-of-the-art rate–distortion performance and generates smaller files than existing methods: 15.8% rate reductions over the baseline hierarchical model and 59.8%, 35%, and 8.4% savings over JPEG, JPEG2000, and BPG, respectively. To the best of our knowledge, our model is the first learning-based method to outperform the top standard image codec (BPG) on both the PSNR and MS-SSIM distortion metrics. View details
    Preview abstract We assess the performance of two techniques in the context of nonlinear transform coding with artificial neural networks, Sadam and GDN. Both techniques have been successfully used in state-of-the-art image compression methods, but their performance has not been individually assessed to this point. Together, the techniques stabilize the training procedure of nonlinear image transforms and increase their capacity to approximate the (unknown) rate–distortion optimal transform functions. Besides comparing their performance to established alternatives, we detail the implementation of both methods and provide open-source code along with the paper. View details
    Towards a Semantic Perceptual Image Metric
    Sung Jin Hwang
    Sergey Ioffe
    Sean O'Malley
    Charles Rosenberg
    2018 25th IEEE Int. Conf. on Image Processing (ICIP)
    Preview abstract We present a full reference, perceptual image metric based on VGG-16, an artificial neural network trained on object classification. We fit the metric to a new database based on 140k unique images annotated with ground truth by human raters who received minimal instruction. The resulting metric shows competitive performance on TID 2013, a database widely used to assess image quality assessments methods. More interestingly, it shows strong responses to objects potentially carrying semantic relevance such as faces and text, which we demonstrate using a visualization technique and ablation experiments. In effect, the metric appears to model a higher influence of semantic context on judgements, which we observe particularly in untrained raters. As the vast majority of users of image processing systems are unfamiliar with Image Quality Assessment (IQA) tasks, these findings may have significant impact on real-world applications of perceptual metrics. View details
    Preview abstract We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unlike existing autoencoder compression methods, our model trains a complex prior jointly with the underlying autoencoder. We demonstrate that this model leads to state-of-the-art image compression when measuring visual quality using the popular MS-SSIM index, and yields rate–distortion performance surpassing published ANN-based methods when evaluated using a more traditional metric based on squared error (PSNR). Furthermore, we provide a qualitative comparison of models trained for different distortion metrics. View details
    End-to-End Optimized Image Compression
    Valero Laparra
    Eero P. Simoncelli
    5th Int. Conf. on Learning Representations (ICLR) (2017)
    End-to-End Optimization of Nonlinear Transform Codes for Perceptual Quality
    Valero Laparra
    Eero P. Simoncelli
    2016 Picture Coding Symp. (PCS)