Research Areas
Authored Publications
Sort By
RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs
Michael Niemeyer
Ben Mildenhall
Andreas Geiger
Noha Radwan
Computer Vision and Pattern Recognition (CVPR) (2022)
Preview abstract
Neural Radiance Fields (NeRF) have emerged as a powerful representation for the task of novel view synthesis due to their simplicity and state-of-the-art performance. Though NeRF can produce photorealistic renderings of unseen viewpoints when many input views are available, its performance drops significantly when this number is reduced. We observe that the majority of artifacts in sparse input scenarios are caused by errors in the estimated scene geometry, and by divergent behavior at the start of training. We address this by regularizing the geometry and appearance of patches rendered from unobserved viewpoints, and annealing the ray sampling space during training. We additionally use a normalizing flow model to regularize the color of unobserved viewpoints. Our model outperforms not only other methods that optimize over a single scene, but in many cases also conditional models that are extensively pre-trained on large multi-view datasets.
View details
HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video
Chung-Yi Weng
Pratul Srinivasan
CVPR (Computer Vision and Pattern Recognition), IEEE and the Computer Vision Foundation (2022) (to appear)
Preview abstract
We introduce a free-viewpoint rendering method -- HumanNeRF -- that works on a given monocular video of a human performing complex body motions, e.g. a video from YouTube. Our method enables pausing the video at any frame and rendering the subject from arbitrary new camera viewpoints or even a full 360-degree camera path for that particular frame and body pose. This task is particularly challenging, as it requires synthesizing photorealistic details of the body, as seen from various camera angles that may not exist in the input video, as well as synthesizing fine details such as cloth folds and facial appearance. Our method optimizes for a volumetric representation of the person in a canonical T-pose, in concert with a motion field that maps the estimated canonical representation to every frame of the video via backward warps. The motion field is decomposed into skeletal rigid and non-rigid motions, produced by deep networks. We show significant performance improvements over prior work, and compelling examples of free-viewpoint renderings from monocular video of moving humans in challenging uncontrolled capture scenarios.
View details
How to train neural networks for flare removal
Yicheng Wu
Tianfan Xue
Rahul Garg
Jiawen Chen
Ashok Veeraraghavan
ICCV (2021)
Preview abstract
When a camera is pointed at a strong light source, the resulting photograph may contain lens flare artifacts. Flares appear in a wide variety of patterns (halos, streaks, color bleeding, haze, etc.) and this diversity in appearance makes flare removal challenging. Existing analytical solutions make strong assumptions about the artifact’s geometry or brightness, and therefore only work well on a small subset of flares. Machine learning techniques have shown success in removing other types of artifacts, like reflections, but have not been widely applied to flare removal due to the lack of training data. To solve this problem, we explicitly model the optical causes of flare either empirically or using wave optics, and generate semi-synthetic pairs of flare-corrupted and clean images. This enables us to train neural networks to remove lens flare for the first time. Experiments show our data synthesis approach is critical for accurate flare removal, and that models trained with our technique generalize well to real lens flares across different scenes, lighting conditions, and cameras.
View details
Neural Light Transport for Relighting and View Synthesis
Xiuming Zhang
Yun-Ta Tsai
Tiancheng Sun
Tianfan Xue
Philip Davidson
Christoph Rhemann
Paul Debevec
Ravi Ramamoorthi
ACM Transactions on Graphics, 40 (2021)
Preview abstract
The light transport (LT) of a scene describes how it appears under different lighting and viewing directions, and complete knowledge of a scene's LT enables the synthesis of novel views under arbitrary lighting. In this paper, we focus on image-based LT acquisition, primarily for human bodies within a light stage setup. We propose a semi-parametric approach to learn a neural representation of LT that is embedded in the space of a texture atlas of known geometric properties, and model all non-diffuse and global LT as residuals added to a physically-accurate diffuse base rendering. In particular, we show how to fuse previously seen observations of illuminants and views to synthesize a new image of the same scene under a desired lighting condition from a chosen viewpoint. This strategy allows the network to learn complex material effects (such as subsurface scattering) and global illumination, while guaranteeing the physical correctness of the diffuse LT (such as hard shadows). With this learned LT, one can relight the scene photorealistically with a directional light or an HDRI map, synthesize novel views with view-dependent effects, or do both simultaneously, all in a unified framework using a set of sparse, previously seen observations. Qualitative and quantitative experiments demonstrate that our neural LT (NLT) outperforms state-of-the-art solutions for relighting and view synthesis, without separate treatment for both problems that prior work requires.
View details
IBRNet: Learning Multi-View Image-Based Rendering
Kyle Genova
Pratul Srinivasan
Qianqian Wang
Ricardo Martin-Brualla
Zhicheng Wang
Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2021) (to appear)
Preview abstract
We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views.The core of our method is a multilayer perceptron (MLP)that generates RGBA at each 5D coordinate from multi-view image features. Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that naturally generalizes to novel scene types and camera setups. Compared to previous generic image-based rendering (IBR) methods like Multiple-plane images (MPIs) that use discrete volume representations, our method instead produces RGBAs at continuous 5D locations (3D spatial locations and 2D viewing directions), enabling high-resolution imagery rendering.Our rendering pipeline is fully differentiable, and the only input required to train our method are multi-view posed images. Experiments show that our method outperforms previous IBR methods, and achieves state-of-the-art performance when fine tuned on each test scene.
View details
iNeRF: Inverting Neural Radiance Fields for Pose Estimation
Yen-Chen Lin
Pete Florence
Phillip Isola
Alberto Rodriguez
Tsung-Yi Lin
IROS 2021 (to appear)
Preview abstract
We present iNeRF, a framework that performs mesh-free pose estimation by “inverting” a Neural Radiance Field (NeRF). NeRFs have been shown to be remarkably effective for the task of view synthesis — synthesizing photorealistic novel views of real-world scenes or objects. In this work, we investigate whether we can apply analysis-by-synthesis via NeRF for mesh-free, RGB-only 6DoF pose estimation – given an image, find the translation and rotation of a camera relative to a 3D object or scene. Our method assumes that no object mesh models are available during either training or test time. Starting from an initial pose estimate, we use gradient descent to minimize the residual between pixels rendered from a NeRF and pixels in an observed image. In our experiments, we first study 1) how to sample rays during pose refinement for iNeRF to collect informative gradients and 2) how different batch sizes of rays affect iNeRF on a synthetic dataset. We then show that for complex real-world scenes from the LLFF dataset, iNeRF can improve NeRF by estimating the camera poses of novel images and using these images as additional training data for NeRF. Finally, we show iNeRF can perform category-level object pose estimation, including object instances not seen during training, with RGB images by inverting a NeRF model inferred from a single view.
View details
NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections
Ricardo Martin-Brualla*
Noha Radwan*
Alexey Dosovitskiy
Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Preview abstract
We present a learning-based method for synthesizing novel views of complex scenes using only unstructured collections of in-the-wild photographs. We build on Neural Radiance Fields (NeRF), which uses the weights of a multilayer perceptron to model the density and color of a scene as a function of 3D coordinates. While NeRF works well on images of static subjects captured under controlled settings, it is incapable of modeling many ubiquitous, real-world phenomena in uncontrolled images, such as variable illumination or transient occluders. We introduce a series of extensions to NeRF to address these issues, thereby enabling accurate reconstructions from unstructured image collections taken from the internet. We apply our system, dubbed NeRF-W, to internet photo collections of famous landmarks, and demonstrate temporally consistent novel view renderings that are significantly closer to photorealism than the prior state of the art.
View details
Defocus Map Estimation and Blur Removal from a Single Dual-Pixel Image
Ioannis Gkioulekas
Jiawen Chen
Neal Wadhwa
Pratul Srinivasan
Rahul Garg
Shumian Xin
Tianfan Xue
International Conference on Computer Vision (2021)
Preview abstract
We present a method to simultaneously estimate an image's defocus map, i.e., the amount of defocus blur at each pixel, and remove the blur to recover a sharp all-in-focus image using only a single camera capture. Our method leverages data from dual-pixel sensors that are common on many consumer cameras. Though originally designed to assist camera autofocus, dual-pixel sensors have been used to separately recover both defocus maps and all-in-focus images. Past approaches have solved these two problems in isolation and often require large labeled datasets for supervised training. In contrast with those prior works, we show that the two problems are connected, model the optics of dual-pixel images, and set up an optimization problem to jointly solve for both. We use data captured with a consumer smartphone camera to demonstrate that after a one time calibration step, our approach improves upon past approaches for both defocus map estimation and blur removal, without any supervised training.
View details
Sky Optimization: Semantically aware image processing in low-light photography
Yun-Ta Tsai
Huizhong Chen
(2020), pp. 526-527
Preview abstract
The sky is a major component of the appearance of a photograph, and its color and tone can strongly influence the mood of a picture. In nighttime photography, the sky can also suffer from noise and color artifacts. For this reason, there is a strong desire to process the sky in isolation from the rest of the scene to achieve an optimal look.
In this work, we propose an automated method, which can run as a part of a camera pipeline, for creating accurate sky alpha-masks and using them to improve the appearance of the sky.
Our method performs end-to-end sky optimization in less than half a second per image on a mobile device.
We introduce a method for creating an accurate sky-mask dataset that is based on partially annotated images that are inpainted and refined by our modified weighted guided filter. We use this dataset to train a neural network for semantic sky segmentation.
Due to the compute and power constraints of mobile devices, sky segmentation is performed at a low image resolution. Our modified weighted guided filter is used for edge-aware upsampling to resize the alpha-mask to a higher resolution.
With this detailed mask we automatically apply post-processing steps to the sky in isolation, such as automatic spatially varying white-balance, brightness adjustments, contrast enhancement, and noise reduction.
View details
Light Stage Super-Resolution: Continuous High-Frequency Relighting
Tiancheng Sun
Zexiang Xu
Xiuming Zhang
Christoph Rhemann
Paul Debevec
Yun-Ta Tsai
Ravi Ramamoorthi
SIGGRAPH Asia and TOG (2020)
Preview abstract
The light stage has been widely used in computer graphics for the past two decades, primarily to enable the relighting of human faces. By capturing the appearance of the human subject under different light sources, one obtains the light transport matrix of that subject, which enables image-based relighting in novel environments. However, due to the finite number of lights in the stage, the light transport matrix only represents a sparse sampling on the entire sphere. As a consequence, relighting the subject with a point light or a directional source that does not coincide exactly with one of the lights in the stage requires interpolation and resampling the images corresponding to nearby lights, and this leads to ghosting shadows, aliased specularities, and other artifacts. To ameliorate these artifacts and produce better results under arbitrary high-frequency lighting, this paper proposes a learning-based solution for the "super-resolution" of scans of human faces taken from a light stage. Given an arbitrary "query" light direction, our method aggregates the captured images corresponding to neighboring lights in the stage, and uses a neural network to synthesize a rendering of the face that appears to be illuminated by a "virtual" light source at the query location. This neural network must circumvent the inherent aliasing and regularity of the light stage data that was used for training, which we accomplish through the use of regularized traditional interpolation methods within our network. Our learned model is able to produce renderings for arbitrary light directions that exhibit realistic shadows and specular highlights, and is able to generalize across a wide variety of subjects. Our super-resolution approach enables more accurate renderings of human subjects under detailed environment maps, or the construction of simpler light stages that contain fewer light sources while still yielding comparable quality renderings as light stages with more densely sampled lights.
View details