Jump to Content
Peyman Milanfar

Peyman Milanfar

I lead the Computational Imaging/ Image Processing team in Google Research. My team develops core imaging technologies that are used in a number of products at Google.

One of these technologies is RAISR (Rapid and Accurate Image Super-Resolution): Given an image, we wish to produce an image of larger size with significantly more pixels and higher image quality. With pairs of example images, we train a set of filters (i.e., a mapping) that when applied to a given image that is not in the training set, will produce a higher resolution version of it. The work was highlighted in a Research Blog post. The technology was launched for G+ photos G+ Photos worldwide; and also as part of the MotionStills app .

Another is Turbo Denoising for camera pipelines and other imaging applications. We produced a single-frame denoiser that is (1) fast enough to be practical even for mobile devices, and (2) handles content dependent noise that is typical for real camera captures. For realistic camera noise, our results are competitive with BM3D, but with nearly 400 times speedup. This technique allowed us to speed up denoising algorithm by two orders of magnitude, while producing quality that is state of the art. As a side benefit, less noisy images compress better and lead to smaller file sizes.

Another is Style Transfer which is a process of migrating a style from a given image to the content of another, synthesizing a new image which is an artistic mixture of the two. Our algorithm extends earlier work on texture-synthesis, while aiming to get stylized images that get closer in quality to ones produced by Convolutional Neural Networks. The proposed algorithm is fast and flexible, being able to process any pair of content + style images .

My team also works on more theoretical questions. For instance, in RED (Regularization by Denoising) we proposed a new way to use the denoising engine in defining the regularization for any inverse problem. RED is an explicit image-adaptive Laplacian-based regularization functional, making the overall objective functional clear and well-defined. With a complete flexibility to choose the iterative optimization procedure for minimizing the above functional, RED is capable of incorporating any image denoising algorithm, treat general inverse problems very effectively, and is guaranteed to converge to the globally optimal result. As examples of its utility, we test this approach and demonstrate state-of-the-art results in the image deblurring and super-resolution problems.

A bit about my background: Prior to joining Google, I was a Professor of Electrical Engineering at UC Santa Cruz from 1999-2014. I was also Associate Dean for Research at the School of Engineering from 2010-12. From 2012-2014 I was on leave at Google-x, where I helped develop the imaging pipeline for Google Glass. I received my undergraduate education in electrical engineering and mathematics from the University of California, Berkeley, and the MS and PhD degrees in electrical engineering from MIT. I hold 11 US patents, several of which are commercially licensed. He founded MotionDSP in 2005. I've been keynote speaker at numerous technical conferences including Picture Coding Symposium (PCS), SIAM Imaging Sciences, SPIE, and the International Conference on Multimedia (ICME). Along with my former students, I won several best paper awards from the IEEE Signal Processing Society.

I am a Distinguished Lecturer of the IEEE Signal Processing Society, and a Fellow of the IEEE "for contributions to inverse problems and super-resolution in imaging."

Please visit my public website, for the most up to date list of my publications, cv, etc.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Video watermarking embeds a message into a cover video in an imperceptible manner, which can be retrieved even if the video undergoes certain modifications or distortions. Traditional watermarking methods are often manually designed for particular types of distortions and thus cannot simultaneously handle a broad spectrum of distortions. To this end, we propose a robust deep learning-based solution for video watermarking that is end-to-end trainable. Our model consists of a novel multiscale design where the watermarks are distributed across multiple spatial-temporal scales. Extensive evaluations on a wide variety of distortions show that our method outperforms traditional video watermarking methods as well as deep image watermarking models by a large margin. We further demonstrate the practicality of our method on a realistic video-editing application. View details
    Soft Diffusion: Score Matching with General Corruptions
    Giannis Daras
    Alexandros Dimakis
    Transactions on Machine Learning Research (TMLR) (2023)
    Preview abstract We define a broader family of corruption processes that generalizes previously known diffusion models. To reverse these general diffusions, we propose a new objective called Soft Score Matching. Soft Score Matching incorporates the degradation process in the network and provably learns the score function for any linear corruption process. Our new loss trains the model to predict a clean image, that after corruption, matches the diffused observation. This objective learns the gradient of the likelihood under suitable regularity conditions for the family of linear corruption processes. We further develop an algorithm to select the corruption levels for general diffusion processes and a novel sampling method that we call Momentum Sampler. We show experimentally that our framework works for general linear corruption processes, such as Gaussian blur and masking. Our method outperforms all linear diffusion models on CelebA-64 achieving FID score 1.85. We also show computational benefits compared to vanilla denoising diffusion. View details
    SVDiff: Compact Parameter Space for Diffusion Fine-Tuning
    Ligong Han
    Han Zhang
    Dimitris Metaxas
    IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
    Preview abstract Diffusion models have achieved remarkable success in text-to-image generation, enabling the creation of high-quality images from text prompts or other modalities. However, existing methods for customizing these models are limited by handling multiple personalized subjects and the risk of overfitting. Moreover, their large number of parameters is inefficient for model storage. In this paper, we propose a novel approach to address these limitations in existing text-to-image diffusion models for personalization. Our method involves fine-tuning the singular values of the weight matrices, leading to a compact and efficient parameter space that reduces the risk of overfitting and language-drifting. We also propose a Cut-Mix-Unmix data-augmentation technique to enhance the quality of multi-subject image generation and a simple text-based image editing framework. Our proposed SVDiff method has a significantly smaller model size (1.7MB for StableDiffusion) compared to existing methods (vanilla DreamBooth 3.66GB, Custom Diffusion 73MB), making it more practical for real-world applications. View details
    Preview abstract Inversion by Direct Iteration (InDI) is a new formulation for supervised image restoration that avoids the so-called ``regression to the mean'' effect and produces more realistic and detailed images than existing regression-based methods. It does this by gradually improving image quality in small steps, similar to generative denoising diffusion models. Image restoration is an ill-posed problem where multiple high-quality images are plausible reconstructions of a given low-quality input. Therefore, the outcome of a single step regression model is typically an aggregate of all possible explanations, therefore lacking details and realism. The main advantage of InDI is that it does not try to predict the clean target image in a single step but instead gradually improves the image in small steps, resulting in better perceptual quality. While generative denoising diffusion models also work in small steps, our formulation is distinct in that it does not require knowledge of any analytic form of the degradation process. Instead, we directly learn an iterative restoration process from low-quality and high-quality paired examples. InDI can be applied to virtually any image degradation, given paired training data. In conditional denoising diffusion image restoration the denoising network generates the restored image by repeatedly denoising an initial image of pure noise, conditioned on the degraded input. Contrary to conditional denoising formulations, InDI directly proceeds by iteratively restoring the input low-quality image, producing high-quality results on a variety of image restoration tasks, including motion and out-of-focus deblurring, super-resolution, compression artifact removal, and denoising. View details
    Preview abstract Digital watermarking is widely used for copyright protection. Traditional 3D watermarking approaches or commercial software are typically designed to embed messages into 3D meshes, and later retrieve the messages directly from distorted/undistorted watermarked 3D meshes. However, in many cases, users only have access to rendered 2D images instead of 3D meshes. Unfortunately, retrieving messages from 2D renderings of 3D meshes is still challenging and underexplored. We introduce a novel end-toend learning framework to solve this problem through: 1) an encoder to covertly embed messages in both mesh geometry and textures; 2) a differentiable renderer to render watermarked 3D objects from different camera angles and under varied lighting conditions; 3) a decoder to recover the messages from 2D rendered images. From our experiments, we show that our model can learn to embed information visually imperceptible to humans, and to retrieve the embedded information from 2D renderings that undergo 3D distortions. In addition, we demonstrate that our method can also work with other renderers, such as ray tracers and real-time renderers with and without fine-tuning. View details
    Preview abstract Transformers have recently gained significant attention in the computer vision community. However, the lack of scalability of self-attention mechanisms with respect to image size has limited their wide adoption in state-of-the-art vision backbones. In this paper we introduce an efficient and scalable attention model we call multi-axis attention, which consists of two aspects: blocked local and dilated global attention. These design choices allow global-local spatial interactions on arbitrary input resolutions with only linear complexity. We also present a new architectural element by effectively blending our proposed attention model with convolutions, and accordingly propose a simple hierarchical vision backbone, dubbed MaxViT, by simply repeating the basic building block over multiple stages. Notably, MaxViT is able to “see” globally throughout the entire network, even in earlier, high-resolution stages. We demonstrate the effectiveness of our model on a broad spectrum of vision tasks. On image classification, MaxViT achieves state-of-the-art performance under various settings: without extra data, MaxViT attains 86.5% ImageNet-1K top-1 accuracy; with ImageNet-21K pre-training, our model achieves 88.7% top-1 accuracy. For downstream tasks, MaxViT as a backbone delivers favorable performance on object detection as well as visual aesthetic assessment. We also show that our proposed model expresses strong generative modeling capability on ImageNet, demonstrating the superior potential of MaxViT blocks as a universal vision module. The source code and trained models will be available at https://github.com/google-research/maxvit. View details
    MAXIM: Multi-Axis MLP for Image Processing
    Han Zhang
    Alan Bovik
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)
    Preview abstract Recent progress on Transformers and MLP-like models has shown new architecture design paradigms on many computer vision tasks. However, efficacy and efficiency of these models for low-level vision tasks have not been studied extensively. In this paper, we present MAXIM, a general image processing architecture with multi-axis gated MLPs, to advance the possibility of global operators for low-level vision. Our single-stage MAXIM backbone shares a UNet-shaped hierarchy structure and enjoys a long-range interaction brought by spatial-gated MLPs. Specifically, MAXIM contains two MLP-based building blocks. First, we devise a multi-axis gated MLP that allows efficient and scalable spatial mixing of local and global information. Second, we propose a cross-gating block, an alternative to cross-attention, which accounts for cross-example mutual conditioning. Both modules are exclusively based on MLPs, but benefit from being both global and `fully-convolutional,' two desired properties for low-level vision tasks. Our extensive experimental results show that our proposed MAXIM model achieves state-of-the-art performance on more than ten benchmarks across a range of image processing tasks including denoising, deblurring, deraining, dehazing, and enhancement with less or comparable parameters and FLOPs. View details
    Interpretable Unsupervised Diversity Denoising and Artefact Removal
    Mangal Prakash
    Florian Jug
    International Conference on Learning Representations (2022)
    Preview abstract Image denoising and artefact removal are complex inverse problems admitting multiple valid solutions. Unsupervised diversity restoration, that is, obtaining a diverse set of possible restorations given a corrupted image, is important for ambiguity removal in many applications such as microscopy where paired data for supervised training are often unobtainable. In real world applications, imaging noise and artefacts are typically hard to model, leading to unsatisfactory performance of existing unsupervised approaches. This work presents an interpretable approach for unsupervised and diverse image restoration. To this end, we introduce a capable architecture called Hierarchical DivNoising (HDN) based on hierarchical Variational Autoencoder. We show that HDN learns an interpretable multi-scale representation of artefacts and we leverage this interpretability to remove imaging artefacts commonly occurring in microscopy data. Our method achieves state-of-the-art results on twelve benchmark image denoising datasets while providing access to a whole distribution of sensibly restored solutions. Additionally, we demonstrate on three real microscopy datasets that HDN removes artefacts without supervision, being the first method capable of doing so while generating multiple plausible restorations all consistent with the given corrupted image. View details
    Preview abstract Image deblurring is an ill-posed problem with multiple plausible solutions given a single input image. However, most existing methods produce a deterministic estimate of the clean image and are trained to minimize pixel-level distortion. These metrics are known to be poorly correlated with human perception, and often lead to unrealistic reconstructions. We present an alternative framework for single-image blind deblurring based on conditional diffusion models. Unlike existing techniques, we train a stochastic sampler that refines the output of a deterministic predictor and is capable of producing a diverse set of plausible reconstructions for a single input. This leads to a significant improvement in perceptual quality over existing state-of-the-art methods across multiple standard benchmarks. Our predict-and-refine approach also enables much more efficient sampling compared to the standard diffusion model. Combined with a carefully tuned network architecture and inference procedure, our method is shown to be competitive in terms of traditional quantitative distortion metrics such as PSNR. These results show clear benefits of stochastic diffusion-based methods for deblurring and challenge the widely used strategy of producing a single, deterministic reconstruction. View details
    Preview abstract Video quality assessment for User Generated Content (UGC) is an important topic in both industry and academia. Most existing methods only focus on one aspect of the perceptual quality assessment, such as technical quality or compression artifacts. In this paper, we create a large scale dataset to comprehensively investigate characteristics of generic UGC video quality. Besides the subjective ratings and content labels of the dataset, we also propose a DNN-based framework to thoroughly analyze importance of content, technical quality, and compression level in perceptual quality. Our model is able to provide quality scores as well as human-friendly quality indicators, to bridge the gap between low level video signals to human perceptual quality. Experimental results show that our model achieves state-of-the-art correlation with Mean Opinion Scores (MOS). View details
    Projected Distribution Loss for Image Enhancement
    2021 IEEE International Conference on Computational Photography (ICCP), pp. 1-12
    Preview abstract Features obtained from object detection CNNs have been widely used for measuring perceptual similarities between images. Such differentiable metrics can be used as perceptual learning losses to train image enhancement models. However, choice of the distance function between input and target features may have a consequential impact on the performance of trained model. While using the norm of the difference between extracted features leads to limited hallucination of details, measuring distance between distributions of features may generate more textures; yet also more unrealistic details and artifacts. In this paper, we demonstrate that aggregating 1D-Wasserstein distances between CNN activations is more reliable than the existing approaches, and it can significantly improve the perceptual performance of enhancement models. More explicitly, we show that in imaging applications such as denoising, super-resolution, demosaicing, deblurring and JPEG artifact removal, the proposed learning loss outperforms the current state-of-the-art on reference-based perceptual losses. This means that the proposed learning loss can be plugged into different imaging frameworks and produce perceptually realistic results. View details
    Multi-path Neural Networks for On-device Multi-domain Visual Classification
    Andrew Howard
    Gabriel M. Bender
    Grace Chu
    Jeff Gilbert
    Joshua Greaves
    Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2021), pp. 3019-3028
    Preview abstract Learning multiple domains/tasks with a single model is important for improving data efficiency and lowering inference cost for numerous vision tasks, especially on resource-constrained mobile devices. However, hand-crafting a multi-domain/task model can be both tedious and challenging. This paper proposes a novel approach to automatically learn a multi-path network for multi-domain visual classification on mobile devices. The proposed multi-path network is learned from neural architecture search by applying one reinforcement learning controller for each domain to select the best path in the super-network created from a MobileNetV3-like search space. An adaptive balanced domain prioritization algorithm is proposed to balance optimizing the joint model on multiple domains simultaneously. The determined multi-path model selectively shares parameters across domains in shared nodes while keeping domain-specific parameters within non-shared nodes in individual domain paths. This approach effectively reduces the total number of parameters and FLOPS, encouraging positive knowledge transfer while mitigating negative interference across domains. Extensive evaluations on the Visual Decathlon dataset demonstrate that the proposed multi-path model achieves state-of-the-art performance in terms of accuracy, model size, and FLOPS against other approaches using MobileNetV3-like architectures. Furthermore, the proposed method improves average accuracy over learning single-domain models individually, and reduces the total number of parameters and FLOPS by 78% and 32% respectively, compared to the approach that simply bundles single-domain models for multi-domain learning. View details
    Learning to Reduce Defocus Blur by Realistically Modeling Dual-Pixel Data
    Abdullah Abuolaim
    Michael S. Brown
    International Conference on Computer Vision (ICCV) (2021)
    Preview abstract Recent work has shown impressive results on data-driven defocus deblurring using the two-image views available on modern dual-pixel (DP) sensors. One significant challenge in this line of research is access to DP data. Despite many cameras having DP sensors, only a limited number provide access to the low-level DP sensor images. In addition, capturing training data for defocus deblurring involves a time-consuming and tedious setup requiring the camera's aperture to be adjusted. Some cameras with DP sensors (e.g., smartphones) do not have adjustable apertures, further limiting the ability to produce the necessary training data. We address the data capture bottleneck by proposing a procedure to generate realistic DP data synthetically. Our synthesis approach mimics the optical image formation found on DP sensors and can be applied to virtual scenes rendered with standard computer software. Leveraging these realistic synthetic DP images, we introduce a recurrent convolutional network (RCN) architecture that improves deblurring results and is suitable for use with single-frame and multi-frame data (e.g., video) captured by DP sensors. Finally, we show that our synthetic DP data is useful for training DNN models targeting video deblurring applications where access to DP data remains challenging. View details
    Preview abstract Could we compress images via standard codecs while avoiding visible artifacts? The answer is obvious -- this is doable as long as the bit budget is generous enough. What if the allocated bit-rate for compression is insufficient? Then unfortunately, artifacts are a fact of life. Many attempts were made over the years to fight this phenomenon, with various degrees of success. In this work we aim to break the unholy connection between bit-rate and image quality, and propose a way to circumvent compression artifacts by pre-editing the incoming image and modifying its content to fit the given bits. We design this editing operation as a learned convolutional neural network, and formulate an optimization problem for its training. Our loss takes into account a proximity between the original image and the edited one, a bit-budget penalty over the proposed image, and a no-reference image quality measure for forcing the outcome to be visually pleasing. The proposed approach is demonstrated on the popular JPEG compression, showing savings in bits and/or improvements in visual quality, obtained with intricate editing effects. View details
    MUSIQ: Multi-scale Image Quality Transformer
    Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
    Preview abstract Image quality assessment (IQA) is an important research topic for understanding and improving visual experience. The current state-of-the-art IQA methods are based on convolutional neural networks (CNNs). The performance of CNN-based models is often compromised by the fixed shape constraint in batch training. To accommodate this, the input images are usually resized and cropped to a fixed shape, causing image quality degradation. To address this, we design a multi-scale image quality Transformer (MUSIQ) to process native resolution images with varying sizes and aspect ratios. With a multi-scale image representation, our proposed method can capture image quality at different granularities. Furthermore, a novel hash-based 2D spatial embedding and a scale embedding is proposed to support the positional embedding in the multi-scale representation. Experimental results verify that our method can achieve state-of-the-art performance on multiple large scale IQA datasets such as PaQ-2-PiQ, SPAQ, and KonIQ-10k. View details
    Preview abstract Lossy Image compression is necessary for efficient storage and transfer of data. Typically the trade-off between bit-rate and quality determines the optimal compression level. This makes the image quality metric an integral part of any imaging system. While the existing full-reference metrics such as PSNR and SSIM may be less sensitive to perceptual quality, the recently introduced learning methods may fail to generalize to unseen data. In this paper we propose the largest image compression quality dataset to date with human perceptual preferences, enabling the use of deep learning, and we develop a full reference perceptual quality assessment metric for lossy image compression that outperforms the existing state-of-the-art methods. We show that the proposed model can effectively learn from thousands of examples available in the new dataset, and consequently it generalizes better to other unseen datasets of human perceptual preference. The CIQA dataset can be found at https://github.com/googleresearch/google-research/tree/master/CIQA View details
    Preview abstract Most video super-resolution methods focus on restoring high-resolution video frames from low-resolution videos without taking into account compression. However, most videos on the web or mobile devices are compressed, and the compression can be severe when the bandwidth is limited. In this paper, we propose a new compression-informed video super-resolution model to restore high-resolution content without introducing artifacts caused by compression. The proposed model consists of three modules for video super-resolution: bi-directional recurrent warping, detail-preserving flow estimation, and Laplacian enhancement. All these three modules are used to deal with compression properties such as the location of the intra-frames in the input and smoothness in the output frames. For thorough performance evaluation, we conducted extensive experiments on standard datasets with a wide range of compression rates, covering many real video use cases. We showed that our method not only recovers high-resolution content on uncompressed frames from the widely-used benchmark datasets, but also achieves state-of-the-art performance in super-resolving compressed videos based on numerous quantitative metrics. We also evaluated the proposed method by simulating streaming from YouTube to demonstrate its effectiveness and robustness. The source codes and trained models are available at https://github.com/google-research/googleresearch/tree/master/comisr. View details
    Preview abstract We present a highly efficient blind image restoration method to remove mild blur in natural images. Contrary to the mainstream, we focus on removing slight blur that is often present damaging image quality and commonly generated by small out-of-focus, lens blur or slight camera motion. The proposed algorithm first estimates image blur and then compensates for it by combining multiple applications of the estimated blur in a principle-based way. In this sense, we present a novel procedure to design the approximate inverse of a filter and make only use of re-applications of the filter itself. To estimate image blur in natural images we introduce a simple yet robust algorithm based on empirical observations about the distribution of the gradient in sharp images. Our experiments show that, in the context of mild blur, the proposed method outperforms traditional and modern blind deconvolution methods and runs in a fraction of time. We finally show that the method can be used to blindly correct blur before applying an out-of-the-shelf deep super-resolution model leading to superior results than other highly complex and computational demanding methods. The proposed method can estimate and remove mild blur on a 12Mp image on a modern mobile phone device in a fraction of a second. View details
    Learning to Resize Images for Computer Vision Tasks
    ICCV 2021: International Conference on Computer Vision (2021)
    Preview abstract For all the ways convolutional neural nets have revolutionized computer vision in recent years, one important aspect has received surprisingly little attention: the effect of image size on the accuracy of tasks being trained for. Typically, to be efficient, the input images are resized to a relatively small spatial resolution (e.g.224×224), and both training and inference are carried out at this resolution. The actual mechanism for this re-scaling has been an afterthought: Namely, off-the-shelf image resizers such as bilinear and bicubic are commonly used in most machine learning software frameworks. But do these resizers limit the on task performance of the trained networks? The answer is yes. Indeed, we show that the typical linear resizer can be replaced with learned resizers that can substantially improve performance. Importantly, while the classical resizers typically result in better perceptual quality of the downscaled images, our proposed learned resizers do not necessarily give better visual quality, but instead improve task performance. Our learned image resizer is jointly trained with a baseline vision model. This learned CNN-based resizer creates machine friendly visual manipulations that lead to a consistent improvement of the end task metric over the baseline model. Specifically, here we focus on the classification task with the ImageNet dataset, and experiment with four different models to learn resizers adapted to each model. Moreover, we show that the proposed resizer can also be useful for fine-tuning the classification baselines for other vision tasks. To this end, we experiment with three different baselines to develop image quality assessment (IQA) models on the AVA dataset. View details
    Preview abstract JPEG is an old yet popular image compression format, sup-ported by all imaging devices and software packages. A key ingredientgoverning its performance are the two quantization tables (for Luma andChroma) that dictate the loss induced on each DCT coefficient. Pastwork has offered various ideas for better tuning these tables, mainly fo-cusing on rate-distortion performance and using derivative-free optimiza-tion techniques. This work offers a novel optimal tuning of these tablesvia continuous optimization, leveraging a differential implementation ofboth the JPEG encoder-decoder and an entropy estimator. This enablesus to offer a unified framework that considers the interplay between fourperformance measures: rate, distortion, perceptual quality, and classi-fication accuracy. We also propose a deep-neural network design thatautomatically assigns optimized quantization tables to each incomingimage. In all these fronts, we report a substantial boost in performanceby a simple and easily implemented modification of these tables. View details
    Rank-smoothed Pairwise Learning in Perceptual Quality Assessment
    Ehsan Amid
    (ICIP 2020) 2020 IEEE International Conference on Image Processing (2020)
    Preview abstract Conducting pairwise comparisons is a widely used approach in curating human perceptual preference data. Typically raters are instructed to make their choices according to a specific set of rules that address certain dimensions of image quality and aesthetics. The outcome of this process is a dataset of sampled image pairs with their associated empirical preference probabilities. Training a model on these pairwise preferences is a common deep learning approach. However, optimizing by gradient descent through mini-batch learning means that the “global” ranking of the images is not explicitly taken into account. In other words, each step of the gradient descent relies only on a limited number of pairwise comparisons. In this work, we demonstrate that regularizing the pairwise empirical probabilities with aggregated rankwise probabilities leads to a more reliable training loss. We show that training a deep image quality assessment model with our rank-smoothed loss consistently improves the accuracy of predicting human preferences. View details
    Super-resolving Commercial Satellite Imagery Using Realistic Training Data
    Xiang Zhu
    Xinwei Shi
    IEEE International Conference on Image Processing 2020 (2020)
    Preview abstract In machine learning based single image super-resolution, the degradation model is embedded in training data generation. However, most existing satellite image super-resolution methods use a simple down-sampling model with a fixed kernel to create training images. These methods work fine on synthetic data, but do not perform well on real satellite images. We proposed a realistic training data generation model for commercial satellite imagery products, which includes not only the imaging process on satellites but also the post-process on the ground. We also proposed a convolutional neural network optimized for satellite images. Experiments show that the proposed training data generation model is able to improve super-resolution performance on real satellite images. View details
    Image Stylization: From predefined to personalized
    Bart Wronski
    IET Computer Vision, vol. Computer Vision for the Creative Industries (2020), pp. 14
    Preview abstract We present a framework for interactive design of new image stylizations using a wide range of predefined filter blocks. Both novel and off-the-shelf image filtering and rendering techniques are extended and combined to allow the user to unleash their creativity to intuitively invent, modify, and tune new styles from a given set of filters. In parallel to this manual design, we propose a novel procedural approach that automatically assembles sequences of filters, leading to unique and novel styles. An important aim of our framework is to allow for interactive exploration and design, as well as to enable videos and camera streams to be stylized on the fly. In order to achieve this real-time performance, we use the Best Linear Adaptive Enhancement (BLADE) framework – an interpretable shallow machine learning method that simulates complex filter blocks in real time. Our representative results include over a dozen styles designed using our interactive tool, a set of styles created procedurally, and new filters trained with our BLADE approach. View details
    Preview abstract Todays video transcoding pipelines choose transcoding parameters based on Rate-Distortion curves, which mainlyfocuses on the relative quality difference between original and transcoded videos. By investigating recentlyreleased YouTube UGC dataset, we found that people were more tolerant to the quality changes in low qualityinputs than in high quality inputs, which suggests that current transcoding framework could be further optimizedby considering input perceptual quality. An efficient machine learning based metric was proposed to detect lowquality inputs, whose bitrate can be further reduced without hurting perceptual quality. To evaluate the impacton perceptual quality, we conducted a crowd-sourcing subjective experiment, and provided a methodology toevaluate statistical significance among different treatments. The results showed that the proposed quality guidedtranscoding framework is able to reduce the average bitrate upto 5% with insignificant quality degradation. View details
    GIFnets: An end-to-end neural network based GIF encoding framework
    Innfarn Yoo
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Seattle (2020)
    Preview abstract Graphics Interchange Format (GIF) is a widely used image file format. Due to the limited number of palette colors, GIF encoding often introduces color banding artifacts. Traditionally, dithering is applied to reduce color banding, but introducing dotted-pattern artifacts. To reduce artifacts and provide a better and more efficient GIF encoding, we introduce a differentiable GIF encoding pipeline, which includes three novel neural networks: PaletteNet, DitherNet, and BandingNet. Each of these three networks provides an important functionality within the GIF encoding pipeline. PaletteNet predicts a near-optimal color palette given an input image. DitherNet manipulates the input image to reduce color banding artifacts and provides an alternative to traditional dithering. Finally, BandingNet is designed to detect color banding, and provides a new perceptual loss specifically for GIF images. As far as we know, this is the first fully differentiable GIF encoding pipeline based on deep neural networks and compatible with existing GIF decoders. User study shows that our algorithm is better than Floyd-Steinberg based GIF encoding. View details
    Distortion Blind Deep Watermarking
    Ruohan Zhan
    Huiwen Chang
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
    Preview abstract Watermarking is the process of embedding information into an image that can survive under distortions, while requiring the encoded image to have little or no perceptual difference with the original image. Recently, deep learning-based methods achieved impressive results in both visual quality and message payload under a wide variety of image distortions. However, these methods all require differentiable models for the image distortions at training time, and may generalize poorly to unknown distortions. This is undesirable since the types of distortions applied to watermarked images are usually unknown and non-differentiable. In this paper, we propose a new framework for distortion-agnostic watermarking, where the image distortion is not explicitly modeled during training. Instead, the robustness of our system comes from two sources: adversarial training and channel coding. Compared to training on a fixed set of distortions and noise levels, our method achieves comparable or better results on distortions available during training, and better performance overall on unknown distortions. View details
    Handheld Multi-Frame Super-Resolution
    Bartlomiej Wronski
    Manfred Ernst
    Marc Levoy
    ACM Transactions on Graphics (TOG), vol. 38 (2019), pp. 18
    Preview abstract Compared to DSLR cameras, smartphone cameras have smaller sensors, which limits their spatial resolution; smaller apertures, which limits their light gathering ability; and smaller pixels, which reduces their signal-to noise ratio. The use of color filter arrays (CFAs) requires demosaicing, which further degrades resolution. In this paper, we supplant the use of traditional demosaicing in single-frame and burst photography pipelines with a multiframe super-resolution algorithm that creates a complete RGB image directly from a burst of CFA raw images. We harness natural hand tremor, typical in handheld photography, to acquire a burst of raw frames with small offsets. These frames are then aligned and merged to form a single image with red, green, and blue values at every pixel site. This approach, which includes no explicit demosaicing step, serves to both increase image resolution and boost signal to noise ratio. Our algorithm is robust to challenging scene conditions: local motion, occlusion, or scene changes. It runs at 100 milliseconds per 12-megapixel RAW input burst frame on mass-produced mobile phones. Specifically, the algorithm is the basis of the Super-Res Zoom feature, as well as the default merge method in Night Sight mode (whether zooming or not) on Google's flagship phone. View details
    Preview abstract Automatically learned quality assessment for images has recently become a hot topic due to its usefulness in a wide variety of applications such as evaluating image capture pipelines, storage techniques and sharing mediums. Despite the subjective nature of this problem, most existing methods only predict the mean opinion score provided by datasets such as AVA [1] and TID2013 [2]. Our approach differs from others in that we predict the distribution of human opinion scores. Our architecture also has the advantage of being significantly simpler than other methods with comparable performance. Our proposed approach relies on the success (and retraining) of proven, state-of-the-art deep object recognition networks. Our resulting network can be used to not only score images reliably and with high correlation to human perception, but also to assist with adaptation and optimization of photo editing/enhancement algorithms in a photographic pipeline. All this is done without need of a “golden” reference image, consequently allowing for single-image, semantic- and perceptually-aware, no-reference quality assessment. View details
    Learned Perceptual Image Enhancement
    ICCP (International Conference on Computational Photography 2018) (2018)
    Preview abstract Learning of a typical image enhancement pipeline involves minimization of a loss function between enhanced and reference images. While L 1 and L 2 losses are perhaps the most widely used functions for this purpose, they do not necessarily lead to perceptually compelling results. In this paper, we show that adding a learned no-reference image quality metric to the loss can significantly improve enhancement operators. This metric is a CNN (convolutional neural network) trained on a large-scale dataset labelled with aesthetic preference of human raters. This loss allows us to conveniently perform back-propagation in our learning framework to simultaneously optimize for similarity to a given ground truth reference and perceptual quality. This perceptual loss is only used to train parameters of image processing operators, and does not impose any extra complexity at inference time. Our experiments demonstrate that this loss can be effective for tuning a variety of operators such as local tone mapping and dehazing. View details
    Fast, Trainable, Multiscale Denoising
    John Isidoro
    IEEE International Conference on Image Processing (ICIP) (2018) (to appear)
    Preview abstract Denoising is a fundamental imaging application. Versatile but fast filtering has been demanded for mobile camera systems. We present an approach to multiscale filtering which allows real-time applications on low-powered devices. The key idea is to learn a set of kernels that upscales, filters, and blends patches of different scales guided by local structure analysis. This approach is trainable so that learned filters are capable of treating diverse noise patterns and artifacts. Experimental results show that the presented approach produces comparable results to state-of-the-art algorithms while processing time is orders of magnitude faster. View details
    Preview abstract The Rapid and Accurate Image Super Resolution (RAISR) method of Romano, Isidoro, and Milanfar is a computationally efficient image upscaling method using a trained set of filters. We describe a generalization of RAISR, which we name Best Linear Adaptive Enhancement (BLADE). This approach is a trainable edge-adaptive filtering framework that is general, simple, computationally efficient, and useful for a wide range of image processing problems. We show applications to denoising, compression artifact removal, demosaicing, and approximation of anisotropic diffusion equations. View details
    Preview abstract A novel, fast and practical way of enhancing images is introduced in this paper. Our approach builds on Laplacian operators of well-known edge-aware kernels, such as bilateral and nonlocal means, and extends these filter’s capabilities to perform more effective and fast image smoothing, sharpening and tone manipulation. We propose an approximation of the Laplacian, which does not require normalization of the kernel weights. Multiple Laplacians of the affinity weights endow our method with progressive detail decomposition of the input image from fine to coarse scale. These image components are blended by a structure mask, which avoids noise/artifact magnification or detail loss in the output image. Contributions of the proposed method to existing image editing tools are: (1) Low computational and memory requirements, making it appropriate for mobile device implementations (e.g. as a finish step in a camera pipeline), (2) A range of filtering applications from detail enhancement to denoising with only a few control parameters, enabling the user to apply a combination of various (and even opposite) filtering effects. View details
    Preview abstract Image denoising has reached impressive heights in performance and quality -- almost as good as it can ever get. But is this the only way in which tasks in image processing can exploit the image denoising engine? In this paper we offer Regularization by Denoising (RED): using the denoising engine in defining the regularization of the inverse problem. We propose an explicit image-adaptive Laplacian-based regularization functional, making the overall objective functional clear and well-defined. With a complete flexibility to choose the iterative optimization procedure for minimizing the above functional, RED is capable of incorporating any image denoising algorithm, treat general inverse problems very effectively, and is guaranteed to converge to the globally optimal result. As examples of its utility, we test this approach and demonstrate state-of-the-art results in the image deblurring and super-resolution problems. View details
    A NEW CLASS OF IMAGE FILTERS WITHOUT NORMALIZATION
    International Conference on Image Processing, Phoenix, Arizona (2016)
    Preview abstract When applying a filter to an image, it often makes practical sense to maintain the local brightness level from input to output image. This is achieved by normalizing the filer coefficients so that they sum to one. This concept is generally taken for granted, but is particularly important where non-linear filters such as the bilateral or and non-local means are concerned, where the effect on local brightness and contrast can be complex. Here we present a method for achieving the same level of control over the local filter behavior without the need for this normalization. Namely, we show how to closely approximate any normalized filter without in fact needing this normalization step. This yields a new class of filters. We derive a closed-form expression for the approximating filter and analyze its behavior, showing it to be easily controlled for quality and nearness to the exact filter, with a single parameter. Our experiments demonstrate that he un-normalized affinity weights can be effectively used in applications such as image smoothing, sharpening and detail enhancement. View details
    Global Image Denoising
    Hossein Talebi Esfandarani
    IEEE Transactions on Image Processing, vol. 23 (2014), pp. 755-768
    Removing Atmospheric Turbulence via Space-Invariant Deconvolution
    Xiang Zhu
    IEEE Trans. Pattern Anal. Mach. Intell., vol. 35 (2013), pp. 157-170
    Estimating Spatially Varying Defocus Blur From A Single Image
    Xiang Zhu
    Scott Cohen
    Stephen Schiller
    IEEE Transactions on Image Processing, vol. 22 (2013), pp. 4879-4891
    How to SAIF-ly Boost Denoising Performance
    Hossein Talebi Esfandarani
    Xiang Zhu
    IEEE Transactions on Image Processing, vol. 22 (2013), pp. 1470-1485
    Blind Deconvolution Using Alternating Maximum a Posteriori Estimation with Heavy-Tailed Priors
    Jan Kotera
    Filip Sroubek
    CAIP (2) (2013), pp. 59-66
    A Tour of Modern Image Filtering: New Insights and Methods, Both Practical and Theoretical
    IEEE Signal Process. Mag., vol. 30 (2013), pp. 106-128
    Symmetrizing Smoothing Filters
    SIAM J. Imaging Sciences, vol. 6 (2013), pp. 263-284
    Deconvolving PSFs for a Better Motion Deblurring Using Multiple Images
    Xiang Zhu
    Filip Sroubek
    ECCV (5) (2012), pp. 636-647
    Improving denoising filters by optimal diffusion
    Hossein Talebi Esfandarani
    ICIP (2012), pp. 1181-1184
    Robust flash denoising/deblurring by iterative guided filtering
    Hae Jong Seo
    EURASIP J. Adv. Sig. Proc., vol. 2012 (2012), pp. 3
    Robust Multichannel Blind Deconvolution via Fast Alternating Minimization
    Filip Sroubek
    IEEE Transactions on Image Processing, vol. 21 (2012), pp. 1687-1700
    Patch-Based Near-Optimal Image Denoising
    Priyam Chatterjee
    IEEE Transactions on Image Processing, vol. 21 (2012), pp. 1635-1649
    Removing Motion Blur With Space-Time Processing
    Hiroyuki Takeda
    IEEE Transactions on Image Processing, vol. 20 (2011), pp. 2990-3000
    Practical Bounds on Image Denoising: From Estimation to Information
    Priyam Chatterjee
    IEEE Transactions on Image Processing, vol. 20 (2011), pp. 1221-1233
    Patch-based locally optimal denoising
    Priyam Chatterjee
    ICIP (2011), pp. 2553-2556
    Action Recognition from One Example
    Hae Jong Seo
    IEEE Trans. Pattern Anal. Mach. Intell., vol. 33 (2011), pp. 867-882
    Superfast superresolution
    Filip Sroubek
    Jan Kamenický
    ICIP (2011), pp. 1153-1156
    Restoration for weakly blurred and strongly noisy images
    Xiang Zhu
    WACV (2011), pp. 103-109
    Iteratively merging information from a pair of flash/no-flash images using nonlinear diffusion
    Hae Jong Seo
    ICCV Workshops (2011), pp. 1324-1331
    Face Verification Using the LARK Representation
    Hae Jong Seo
    IEEE Transactions on Information Forensics and Security, vol. 6 (2011), pp. 1275-1286
    Visual saliency for automatic target detection, boundary detection, and image quality assessment
    Hae Jong Seo
    ICASSP (2010), pp. 5578-5581
    Is Denoising Dead?
    Priyam Chatterjee
    IEEE Transactions on Image Processing, vol. 19 (2010), pp. 895-911
    Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels
    Hae Jong Seo
    IEEE Trans. Pattern Anal. Mach. Intell., vol. 32 (2010), pp. 1688-1704
    Fundamental limits of image denoising: Are we there yet?
    Priyam Chatterjee
    ICASSP (2010), pp. 1358-1361
    A no-reference image content metric and its application to denoising
    Xiang Zhu
    ICIP (2010), pp. 1145-1148
    Automatic Parameter Selection for Denoising Algorithms Using a No-Reference Measure of Image Content
    Xiang Zhu
    IEEE Transactions on Image Processing, vol. 19 (2010), pp. 3116-3132
    Learning denoising bounds for noisy images
    Priyam Chatterjee
    ICIP (2010), pp. 1157-1160
    Nonlinear kernel backprojection for computed tomography
    Hiroyuki Takeda
    ICASSP (2010), pp. 618-621
    Detection of human actions from a single example
    Hae Jong Seo
    ICCV (2009), pp. 1965-1970
    A Non-Parametric Approach to Automatic Change Detection in MRI Images of the Brain
    Hae Jong Seo
    ISBI (2009), pp. 245-248
    Image denoising using locally learned dictionaries
    Priyam Chatterjee
    Computational Imaging (2009), pp. 72460
    Optimal Registration Of Aliased Images Using Variable Projection With Applications To Super-Resolution
    M. Dirk Robinson
    Sina Farsiu
    Comput. J., vol. 52 (2009), pp. 31-42
    Super-Resolution Without Explicit Subpixel Motion Estimation
    Hiroyuki Takeda
    Matan Protter
    Michael Elad
    IEEE Transactions on Image Processing, vol. 18 (2009), pp. 1958-1975
    An Adaptive Nonparametric Approach to Restoration and Interpolation for Medical Imaging
    Hiroyuki Takeda
    ISBI (2009), pp. 666-669
    Generalizing the Nonlocal-Means to Super-Resolution Reconstruction
    Matan Protter
    Michael Elad
    Hiroyuki Takeda
    IEEE Transactions on Image Processing, vol. 18 (2009), pp. 36-51
    Clustering-Based Denoising With Locally Learned Dictionaries
    Priyam Chatterjee
    IEEE Transactions on Image Processing, vol. 18 (2009), pp. 1438-1451
    On Iterative Regularization and Its Application
    Michael R. Charest
    IEEE Trans. Circuits Syst. Video Techn., vol. 18 (2008), pp. 406-411
    Spatio-temporal video interpolation and denoising using motion-assisted steering kernel (MASK) regression
    Hiroyuki Takeda
    Peter van Beek
    ICIP (2008), pp. 637-640
    Video denoising using higher order optimal space-time adaptation
    Hae Jong Seo
    ICASSP (2008), pp. 1249-1252
    Deblurring Using Regularized Locally Adaptive Kernel Regression
    Hiroyuki Takeda
    Sina Farsiu
    IEEE Transactions on Image Processing, vol. 17 (2008), pp. 550-563
    A generalization of non-local means via kernel regression
    Priyam Chatterjee
    Computational Imaging (2008), pp. 68140
    Using local regression kernels for statistical object detection
    Hae Jong Seo
    ICIP (2008), pp. 2380-2383
    Mask Design for Optical Microlithography-An Inverse Imaging Problem
    Amyn Poonawala
    IEEE Transactions on Image Processing, vol. 16 (2007), pp. 774-788
    Multi-Scale Statistical Detection and Ballistic Imaging Through Turbid Media
    Sina Farsiu
    ICIP (3) (2007), pp. 537-540
    Kernel Regression for Image Processing and Reconstruction
    Hiroyuki Takeda
    Sina Farsiu
    IEEE Transactions on Image Processing, vol. 16 (2007), pp. 349-366
    Video-to-Video Dynamic Super-Resolution for Grayscale and Color Sequences
    Sina Farsiu
    Michael Elad
    EURASIP J. Adv. Sig. Proc., vol. 2006 (2006)
    Statistical and Information-Theoretic Analysis of Resolution in Imaging
    Morteza Shahram
    IEEE Transactions on Information Theory, vol. 52 (2006), pp. 3411-3437
    Super-Resolution Imaging: Analysis, Algorithms, and Applications
    Michael K. P. Ng
    Tony F. Chan
    Moon Gi Kang
    EURASIP J. Adv. Sig. Proc., vol. 2006 (2006)
    Statistical performance analysis of super-resolution
    M. Dirk Robinson
    IEEE Transactions on Image Processing, vol. 15 (2006), pp. 1413-1428
    Robust Kernel Regression for Restoration and Reconstruction of Images from Sparse Noisy Data
    Hiroyuki Takeda
    Sina Farsiu
    ICIP (2006), pp. 1257-1260
    Multiframe demosaicing and super-resolution of color images
    Sina Farsiu
    Michael Elad
    IEEE Transactions on Image Processing, vol. 15 (2006), pp. 141-159
    Shape Estimation from Support and Diameter Functions
    Amyn Poonawala
    Richard J. Gardner
    Journal of Mathematical Imaging and Vision, vol. 24 (2006), pp. 229-244
    Multidimensional Integral Inversion, with Applications in Shape Reconstruction
    Annie A. M. Cuyt
    Gene H. Golub
    Brigitte Verdonk
    SIAM J. Scientific Computing, vol. 27 (2005), pp. 1058-1070
    Bias minimizing filter design for gradient-based image registration
    M. Dirk Robinson
    Sig. Proc.: Image Comm., vol. 20 (2005), pp. 554-568
    The Generalized Eigenvalue Problem for Nonsquare Pencils Using a Minimal Perturbation Approach
    Gregory Boutry
    Michael Elad
    Gene H. Golub
    SIAM J. Matrix Analysis Applications, vol. 27 (2005), pp. 582-601
    Variable projection for near-optimal filtering in low bit-rate block coders
    Yaakov Tsaig
    Michael Elad
    Gene H. Golub
    IEEE Trans. Circuits Syst. Video Techn., vol. 15 (2005), pp. 154-160
    Local detectors for high-resolution spectral analysis: Algorithms and performance
    Morteza Shahram
    Digital Signal Processing, vol. 15 (2005), pp. 305-316
    Improved spectral analysis of nearby tones using local detectors
    Morteza Shahram
    ICASSP (4) (2005), pp. 637-640
    On the resolvability of sinusoids with nearby frequencies in the presence of noise
    Morteza Shahram
    IEEE Transactions on Signal Processing, vol. 53 (2005), pp. 2579-2588
    Advances and challenges in super-resolution
    Sina Farsiu
    M. Dirk Robinson
    Michael Elad
    Int. J. Imaging Systems and Technology, vol. 14 (2004), pp. 47-57
    Trained detection of buried mines in SAR images via the deflection-optimal criterion
    Russell B. Cosgrove
    Joel Kositsky
    IEEE T. Geoscience and Remote Sensing, vol. 42 (2004), pp. 2569-2575
    Fast and robust multiframe super resolution
    Sina Farsiu
    M. Dirk Robinson
    Michael Elad
    IEEE Transactions on Image Processing, vol. 13 (2004), pp. 1327-1344
    Fundamental performance limits in image registration
    M. Dirk Robinson
    IEEE Transactions on Image Processing, vol. 13 (2004), pp. 1185-1199
    Imaging below the diffraction limit: a statistical analysis
    Morteza Shahram
    IEEE Transactions on Image Processing, vol. 13 (2004), pp. 677-689
    Shape from moments - an estimation theory perspective
    Michael Elad
    Gene H. Golub
    IEEE Transactions on Signal Processing, vol. 52 (2004), pp. 1814-1829
    Optimal framework for low bit-rate block coders
    Yaakov Tsaig
    Michael Elad
    Gene H. Golub
    ICIP (2) (2003), pp. 219-222
    Reconstruction of Convex Bodies from Brightness Functions
    R. J. Gardner
    Discrete Computational Geometry, vol. 29 (2003), pp. 279-303
    Fast Local and Global Projection-Based Methods for Affine Motion Estimation
    M. Dirk Robinson
    Journal of Mathematical Imaging and Vision, vol. 18 (2003), pp. 35-54
    Fundamental performance limits in image registration
    M. Dirk Robinson
    ICIP (2) (2003), pp. 323-326
    Fast and robust super-resolution
    Sina Farsiu
    M. Dirk Robinson
    Michael Elad
    ICIP (2) (2003), pp. 291-294
    A statistical analysis of diffraction-limited imaging
    Ali Shakouri
    ICIP (1) (2002), pp. 864-867
    Efficient generalized cross-validation with applications to parametric image restoration and resolution enhancement
    Nhat Nguyen
    Gene H. Golub
    IEEE Transactions on Image Processing, vol. 10 (2001), pp. 1299-1308
    A computationally efficient superresolution image reconstruction algorithm
    Nhat Nguyen
    Gene H. Golub
    IEEE Transactions on Image Processing, vol. 10 (2001), pp. 573-583
    An Efficient Wavelet-Based Algorithm for Image Superresolution
    Nhat Nguyen
    ICIP (2000), pp. 351-354
    Preconditioners for regularized image superresolution
    Nhat Nguyen
    Gene H. Golub
    ICASSP (1999), pp. 3249-3252
    Two-dimensional matched filtering for motion estimation
    IEEE Transactions on Image Processing, vol. 8 (1999), pp. 438-444
    A model of the effect of image motion in the Radon transform domain
    IEEE Transactions on Image Processing, vol. 8 (1999), pp. 1276-1281
    Motion from Projections: A Forward Model
    ICIP (2) (1998), pp. 695-699
    A moment-based variational approach to tomographic reconstruction
    William Clement Karl
    Alan S. Willsky
    IEEE Transactions on Image Processing, vol. 5 (1996), pp. 459-470
    On the hough transform of a polygon
    Pattern Recognition Letters, vol. 17 (1996), pp. 209-210
    Reconstructing polygons from moments with connections to array processing
    George C. Verghese
    William Clement Karl
    Alan S. Willsky
    IEEE Transactions on Signal Processing, vol. 43 (1995), pp. 432-443
    Moment-Based Geometric Image Reconstruction
    William Clement Karl
    Alan S. Willsky
    ICIP (2) (1994), pp. 825-829
    Modeling and Estimation for a Class of Multiresolution Random Fields
    Robert R. Tenney
    Robert B. Washburn
    Alan S. Willsky
    ICIP (3) (1994), pp. 397-401
    Reconstructing Binary Polygonal Objects from Projections: A Statistical View
    William Clement Karl
    Alan S. Willsky
    CVGIP: Graphical Model and Image Processing, vol. 56 (1994), pp. 371-391