DynIBaR: Neural Dynamic Image-Based Rendering
Abstract
We address the problem of synthesizing novel views from a monocular video depicting complex dynamic scenes.
State-of-the-art methods based on temporally varying Neural Radiance Fields (aka \emph{dynamic NeRFs}) have shown impressive results on this task.
However, for long videos with complex object motions and uncontrolled camera trajectories, these methods can produce blurry or inaccurate renderings, hampering their use in real-world applications.
Rather than encoding a dynamic scene within the weights of MLPs, we present a new method that addresses these limitations by adopting a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views in a scene-motion-aware manner.
Our system preserves the advantages for modeling complex scenes and view-dependent effects, but enables synthesizing photo-realistic novel views from long videos featuring complex scene dynamics with unconstrained camera trajectories.
We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets, and also apply our approach to in-the-wild videos with challenging camera and object motion, where prior methods fail to produce high-quality renderings.
State-of-the-art methods based on temporally varying Neural Radiance Fields (aka \emph{dynamic NeRFs}) have shown impressive results on this task.
However, for long videos with complex object motions and uncontrolled camera trajectories, these methods can produce blurry or inaccurate renderings, hampering their use in real-world applications.
Rather than encoding a dynamic scene within the weights of MLPs, we present a new method that addresses these limitations by adopting a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views in a scene-motion-aware manner.
Our system preserves the advantages for modeling complex scenes and view-dependent effects, but enables synthesizing photo-realistic novel views from long videos featuring complex scene dynamics with unconstrained camera trajectories.
We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets, and also apply our approach to in-the-wild videos with challenging camera and object motion, where prior methods fail to produce high-quality renderings.