Brian Curless

Brian Curless

Research Areas

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Generative Powers of Ten
    Xiaojuan Wang
    Steve Seitz
    Ben Mildenhall
    Pratul Srinivasan
    Dor Verbin
    Aleksander Hołyński
    Preview abstract We present a method that uses a text-to-image model to generate consistent content across multiple image scales, enabling extreme semantic zooms into a scene, e.g., ranging from a wide-angle landscape view of a forest to a macro shot of an insect sitting on one of the tree branches. This representation allows us to render continuously zooming videos, or explore different scales of the scene interactively. We achieve this through a joint multi-scale diffusion sampling approach that encourages consistency across different scales while preserving the integrity of each individual sampling process. Since each generated scale is guided by a different text prompt, our method enables deeper levels of zoom than traditional super-resolution methods that may struggle to create new contextual structure at vastly different scales. We compare our method qualitatively with alternative techniques in image super-resolution and outpainting, and show that our method is most effective at generating consistent multi-scale content. View details
    Preview abstract We present FederNeRF, a method that takes a collection of photos of a subject (e.g. Roger Federer) captured across multiple years with arbitrary body poses and appearances, and enables rendering the subject with arbitrary novel combinations of viewpoint, body pose, and appearance. NeRFederer builds a customized neural volumetric 3D model of the subject that is able to render an entire space spanned by camera viewpoint, body pose, and appearance. A central challenge in this task is dealing with sparse observations; a given body pose is likely only observed by a single viewpoint with a single appearance, and a given appearance is only observed under a handful of different body poses. We address this issue by recovering a canonical T-pose neural volumetric representation of the subject that allows for changing appearance across different observations, but uses a shared pose-dependent motion field across all observations. We demonstrate that this approach, along with regularization of the recovered volumetric geometry to encourage smoothness, is able to recover a model that renders compelling images from novel combinations of viewpoint, pose, and appearance from these challenging unstructured photo collections, outperforming prior work for free-viewpoint human rendering. View details
    3D Moments from Near Duplicate Photos
    Qianqian Wang
    Zhengqi Li
    Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
    Preview abstract We introduce a new computational photography effect, starting from a pair of near duplicate photos that are prevalent in people's photostreams. Combining monocular depth synthesis and optical flow, we build a novel end-to-end system that can interpolate scene motion while simultaneously allowing independent control of the camera. We use our system to create short videos with scene motion and cinematic camera motion. We compare our method against two different baselines and demonstrate that our system outperforms them both qualitatively and quantitatively in publicly available benchmark datasets. View details
    FILM: Frame Interpolation for Large Motion
    Fitsum Reda
    Eric Tabellion
    Proceedings of the European conference on computer vision (ECCV) (2022)
    Preview abstract We present a frame interpolation algorithm that synthesizes an engaging slow-motion video from near-duplicate photos which often exhibit large scene motion. Near-duplicates interpolation is an interesting new application, but large motion poses challenges to existing methods. To address this issue, we adapt a feature extractor that shares weights across the scales, and present a “scale-agnostic” motion estimator. It relies on the intuition that large motion at finer scales should be similar to small motion at coarser scales, which boosts the number of available pixels for large motion supervision. To inpaint wide disocclusions caused by large motion and synthesize crisp frames, we propose to optimize our network with the Gram matrix loss that measures the correlation difference between features. To simplify the training process, we further propose a unified single-network approach that removes the reliance on additional optical-flow or depth network and is trainable from frame triplets alone. Our approach outperforms state-of-the-art methods on the Xiph large motion benchmark while performing favorably on Vimeo90K, Middlebury and UCF101. Source codes and pre-trained models are available at https://film-net.github.io. View details
    HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video
    Chung-Yi Weng
    Pratul Srinivasan
    CVPR (Computer Vision and Pattern Recognition), IEEE and the Computer Vision Foundation (2022) (to appear)
    Preview abstract We introduce a free-viewpoint rendering method -- HumanNeRF -- that works on a given monocular video of a human performing complex body motions, e.g. a video from YouTube. Our method enables pausing the video at any frame and rendering the subject from arbitrary new camera viewpoints or even a full 360-degree camera path for that particular frame and body pose. This task is particularly challenging, as it requires synthesizing photorealistic details of the body, as seen from various camera angles that may not exist in the input video, as well as synthesizing fine details such as cloth folds and facial appearance. Our method optimizes for a volumetric representation of the person in a canonical T-pose, in concert with a motion field that maps the estimated canonical representation to every frame of the video via backward warps. The motion field is decomposed into skeletal rigid and non-rigid motions, produced by deep networks. We show significant performance improvements over prior work, and compelling examples of free-viewpoint renderings from monocular video of moving humans in challenging uncontrolled capture scenarios. View details
    Removing an object and its shadows from a photograph
    Edward Zhang
    Ricardo Martin-Brualla
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
    Preview abstract Removing objects from images is an important problem for many applications, such as in mixed reality. For believable removals, the shadows that the object casts onto the scene should also be removed. Current inpainting-based methods for object removal do not consider shadows, or at best require manually adding shadow regions to the inpainting mask. We introduce a deep learning pipeline for removing a shadow along with its caster. We leverage rough scene models in order to remove a wide variety of shadows (hard or soft, dark or subtle, large or thin) from planar surfaces with a wide variety of surface textures. We train our pipeline on synthetically rendered data, and show qualitative and quantitative results on both synthetic and real scenes. View details
    SLIDE: Single Image 3D Photography with Soft Layering and Depth-aware Inpainting
    Varun Jampani*
    Huiwen Chang*
    Kyle Gregory Sargent
    Abhishek Kar
    Mike Krainin
    Dominik Philemon Kaeser
    Ce Liu
    ICCV 2021 (2021)
    Preview abstract Single image 3D photography enables viewers to view a still image from novel viewpoints. Recent approaches for single-image view synthesis combine monocular depth network along with inpainting networks resulting in compelling novel view synthesis results. A drawback of these approaches is the use of hard layering making them not suitable to model intricate appearance effects such as matting. We present SLIDE, a modular and unified system for single image 3D photography that uses simple yet effective soft layering strategy to model appearance effects. In addition, we propose a novel depth-aware training of inpainting network suitable for 3D photography task. Extensive experimental analysis on 3 different view synthesis datasets in combination with user studies on in-the-wild image collections demonstrate the superior performance of our technique in comparison to existing strong baselines. View details
    Monster Mash: A Single-View Approach to Casual 3D Modeling and Animation
    Marek Dvoroznak
    Olga Sorkine-Hornung
    ACM Transactions on Graphics (TOG), ACM, New York, NY, USA (2020), pp. 1-12 (to appear)
    Preview abstract We present a new framework for sketch-based modeling and animation of 3D organic shapes that can work entirely in an intuitive 2D domain, enabling a playful, casual experience. Unlike previous sketch-based tools, our approach does not require a tedious part-based multi-view workflow with the explicit specification of an animation rig. Instead, we combine 3D inflation with a novel rigidity-preserving, layered deformation model, ARAP-L, to produce a smooth 3D mesh that is immediately ready for animation. Moreover, the resulting model can be animated from a single viewpoint — and without the need to handle unwanted inter-penetrations, as required by previous approaches. We demonstrate the benefit of our approach on a variety of examples produced by inexperienced users as well as professional animators. For less experienced users, our single-view approach offers a simpler modeling and animating experience than working in a 3D environment, while for professionals, it offers a quick and casual workspace for ideation. View details