Jonathan T. Barron

Jonathan T. Barron

Research Areas

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video
    Chung-Yi Weng
    Pratul Srinivasan
    CVPR (Computer Vision and Pattern Recognition), IEEE and the Computer Vision Foundation(2022) (to appear)
    Preview abstract We introduce a free-viewpoint rendering method -- HumanNeRF -- that works on a given monocular video of a human performing complex body motions, e.g. a video from YouTube. Our method enables pausing the video at any frame and rendering the subject from arbitrary new camera viewpoints or even a full 360-degree camera path for that particular frame and body pose. This task is particularly challenging, as it requires synthesizing photorealistic details of the body, as seen from various camera angles that may not exist in the input video, as well as synthesizing fine details such as cloth folds and facial appearance. Our method optimizes for a volumetric representation of the person in a canonical T-pose, in concert with a motion field that maps the estimated canonical representation to every frame of the video via backward warps. The motion field is decomposed into skeletal rigid and non-rigid motions, produced by deep networks. We show significant performance improvements over prior work, and compelling examples of free-viewpoint renderings from monocular video of moving humans in challenging uncontrolled capture scenarios. View details
    RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs
    Michael Niemeyer
    Ben Mildenhall
    Andreas Geiger
    Noha Radwan
    Computer Vision and Pattern Recognition (CVPR)(2022)
    Preview abstract Neural Radiance Fields (NeRF) have emerged as a powerful representation for the task of novel view synthesis due to their simplicity and state-of-the-art performance. Though NeRF can produce photorealistic renderings of unseen viewpoints when many input views are available, its performance drops significantly when this number is reduced. We observe that the majority of artifacts in sparse input scenarios are caused by errors in the estimated scene geometry, and by divergent behavior at the start of training. We address this by regularizing the geometry and appearance of patches rendered from unobserved viewpoints, and annealing the ray sampling space during training. We additionally use a normalizing flow model to regularize the color of unobserved viewpoints. Our model outperforms not only other methods that optimize over a single scene, but in many cases also conditional models that are extensively pre-trained on large multi-view datasets. View details
    How to train neural networks for flare removal
    Yicheng Wu
    Tianfan Xue
    Rahul Garg
    Jiawen Chen
    Ashok Veeraraghavan
    ICCV(2021)
    Preview abstract When a camera is pointed at a strong light source, the resulting photograph may contain lens flare artifacts. Flares appear in a wide variety of patterns (halos, streaks, color bleeding, haze, etc.) and this diversity in appearance makes flare removal challenging. Existing analytical solutions make strong assumptions about the artifact’s geometry or brightness, and therefore only work well on a small subset of flares. Machine learning techniques have shown success in removing other types of artifacts, like reflections, but have not been widely applied to flare removal due to the lack of training data. To solve this problem, we explicitly model the optical causes of flare either empirically or using wave optics, and generate semi-synthetic pairs of flare-corrupted and clean images. This enables us to train neural networks to remove lens flare for the first time. Experiments show our data synthesis approach is critical for accurate flare removal, and that models trained with our technique generalize well to real lens flares across different scenes, lighting conditions, and cameras. View details
    Defocus Map Estimation and Blur Removal from a Single Dual-Pixel Image
    Ioannis Gkioulekas
    Jiawen Chen
    Neal Wadhwa
    Pratul Srinivasan
    Rahul Garg
    Shumian Xin
    Tianfan Xue
    International Conference on Computer Vision(2021)
    Preview abstract We present a method to simultaneously estimate an image's defocus map, i.e., the amount of defocus blur at each pixel, and remove the blur to recover a sharp all-in-focus image using only a single camera capture. Our method leverages data from dual-pixel sensors that are common on many consumer cameras. Though originally designed to assist camera autofocus, dual-pixel sensors have been used to separately recover both defocus maps and all-in-focus images. Past approaches have solved these two problems in isolation and often require large labeled datasets for supervised training. In contrast with those prior works, we show that the two problems are connected, model the optics of dual-pixel images, and set up an optimization problem to jointly solve for both. We use data captured with a consumer smartphone camera to demonstrate that after a one time calibration step, our approach improves upon past approaches for both defocus map estimation and blur removal, without any supervised training. View details
    Neural Light Transport for Relighting and View Synthesis
    Xiuming Zhang
    Yun-Ta Tsai
    Tiancheng Sun
    Tianfan Xue
    Philip Davidson
    Christoph Rhemann
    Paul Debevec
    Ravi Ramamoorthi
    ACM Transactions on Graphics, 40(2021)
    Preview abstract The light transport (LT) of a scene describes how it appears under different lighting and viewing directions, and complete knowledge of a scene's LT enables the synthesis of novel views under arbitrary lighting. In this paper, we focus on image-based LT acquisition, primarily for human bodies within a light stage setup. We propose a semi-parametric approach to learn a neural representation of LT that is embedded in the space of a texture atlas of known geometric properties, and model all non-diffuse and global LT as residuals added to a physically-accurate diffuse base rendering. In particular, we show how to fuse previously seen observations of illuminants and views to synthesize a new image of the same scene under a desired lighting condition from a chosen viewpoint. This strategy allows the network to learn complex material effects (such as subsurface scattering) and global illumination, while guaranteeing the physical correctness of the diffuse LT (such as hard shadows). With this learned LT, one can relight the scene photorealistically with a directional light or an HDRI map, synthesize novel views with view-dependent effects, or do both simultaneously, all in a unified framework using a set of sparse, previously seen observations. Qualitative and quantitative experiments demonstrate that our neural LT (NLT) outperforms state-of-the-art solutions for relighting and view synthesis, without separate treatment for both problems that prior work requires. View details
    iNeRF: Inverting Neural Radiance Fields for Pose Estimation
    Yen-Chen Lin
    Pete Florence
    Phillip Isola
    Alberto Rodriguez
    Tsung-Yi Lin
    IROS 2021 (to appear)
    Preview abstract We present iNeRF, a framework that performs mesh-free pose estimation by “inverting” a Neural Radiance Field (NeRF). NeRFs have been shown to be remarkably effective for the task of view synthesis — synthesizing photorealistic novel views of real-world scenes or objects. In this work, we investigate whether we can apply analysis-by-synthesis via NeRF for mesh-free, RGB-only 6DoF pose estimation – given an image, find the translation and rotation of a camera relative to a 3D object or scene. Our method assumes that no object mesh models are available during either training or test time. Starting from an initial pose estimate, we use gradient descent to minimize the residual between pixels rendered from a NeRF and pixels in an observed image. In our experiments, we first study 1) how to sample rays during pose refinement for iNeRF to collect informative gradients and 2) how different batch sizes of rays affect iNeRF on a synthetic dataset. We then show that for complex real-world scenes from the LLFF dataset, iNeRF can improve NeRF by estimating the camera poses of novel images and using these images as additional training data for NeRF. Finally, we show iNeRF can perform category-level object pose estimation, including object instances not seen during training, with RGB images by inverting a NeRF model inferred from a single view. View details
    NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections
    Ricardo Martin-Brualla*
    Noha Radwan*
    Alexey Dosovitskiy
    Conference on Computer Vision and Pattern Recognition (CVPR)(2021)
    Preview abstract We present a learning-based method for synthesizing novel views of complex scenes using only unstructured collections of in-the-wild photographs. We build on Neural Radiance Fields (NeRF), which uses the weights of a multilayer perceptron to model the density and color of a scene as a function of 3D coordinates. While NeRF works well on images of static subjects captured under controlled settings, it is incapable of modeling many ubiquitous, real-world phenomena in uncontrolled images, such as variable illumination or transient occluders. We introduce a series of extensions to NeRF to address these issues, thereby enabling accurate reconstructions from unstructured image collections taken from the internet. We apply our system, dubbed NeRF-W, to internet photo collections of famous landmarks, and demonstrate temporally consistent novel view renderings that are significantly closer to photorealism than the prior state of the art. View details
    IBRNet: Learning Multi-View Image-Based Rendering
    Kyle Genova
    Pratul Srinivasan
    Qianqian Wang
    Ricardo Martin-Brualla
    Zhicheng Wang
    Conference on Computer Vision and Pattern Recognition (CVPR), IEEE(2021) (to appear)
    Preview abstract We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views.The core of our method is a multilayer perceptron (MLP)that generates RGBA at each 5D coordinate from multi-view image features. Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that naturally generalizes to novel scene types and camera setups. Compared to previous generic image-based rendering (IBR) methods like Multiple-plane images (MPIs) that use discrete volume representations, our method instead produces RGBAs at continuous 5D locations (3D spatial locations and 2D viewing directions), enabling high-resolution imagery rendering.Our rendering pipeline is fully differentiable, and the only input required to train our method are multi-view posed images. Experiments show that our method outperforms previous IBR methods, and achieves state-of-the-art performance when fine tuned on each test scene. View details
    Preview abstract We present a deep learning solution for estimating the incident illumination at any 3D location within a scene from an input narrow-baseline stereo image pair. Previous approaches for predicting global illumination from images either predict just a single illumination for the entire scene, or separately estimate the illumination at each 3D location without enforcing that the predictions are consistent with the same 3D scene. Instead, we propose a deep learning model that estimates a 3D volumetric RGBA model of a scene, including content outside the observed field of view, and then uses standard volume rendering to estimate the incident illumination at any 3D location within that volume. Our model is trained without any ground truth 3D data and only requires a held-out perspective view near the input stereo pair and a spherical panorama taken within each scene as supervision, as opposed to prior methods for spatially-varying lighting estimation, which require ground truth scene geometry for training. We demonstrate that our method can predict consistent spatially-varying lighting that is convincing enough to plausibly relight and insert highly specular virtual objects into real images. View details
    Light Stage Super-Resolution: Continuous High-Frequency Relighting
    Tiancheng Sun
    Zexiang Xu
    Xiuming Zhang
    Christoph Rhemann
    Paul Debevec
    Yun-Ta Tsai
    Ravi Ramamoorthi
    SIGGRAPH Asia and TOG(2020)
    Preview abstract The light stage has been widely used in computer graphics for the past two decades, primarily to enable the relighting of human faces. By capturing the appearance of the human subject under different light sources, one obtains the light transport matrix of that subject, which enables image-based relighting in novel environments. However, due to the finite number of lights in the stage, the light transport matrix only represents a sparse sampling on the entire sphere. As a consequence, relighting the subject with a point light or a directional source that does not coincide exactly with one of the lights in the stage requires interpolation and resampling the images corresponding to nearby lights, and this leads to ghosting shadows, aliased specularities, and other artifacts. To ameliorate these artifacts and produce better results under arbitrary high-frequency lighting, this paper proposes a learning-based solution for the "super-resolution" of scans of human faces taken from a light stage. Given an arbitrary "query" light direction, our method aggregates the captured images corresponding to neighboring lights in the stage, and uses a neural network to synthesize a rendering of the face that appears to be illuminated by a "virtual" light source at the query location. This neural network must circumvent the inherent aliasing and regularity of the light stage data that was used for training, which we accomplish through the use of regularized traditional interpolation methods within our network. Our learned model is able to produce renderings for arbitrary light directions that exhibit realistic shadows and specular highlights, and is able to generalize across a wide variety of subjects. Our super-resolution approach enables more accurate renderings of human subjects under detailed environment maps, or the construction of simpler light stages that contain fewer light sources while still yielding comparable quality renderings as light stages with more densely sampled lights. View details