Associating Objects and their Effects in Unconstrained Monocular Video

Mohammed Suhail
Erika Lu
Zhengqi Li
Leonid Sigal
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2023
Google Scholar

Abstract

We propose a method to decompose a video into a back-
ground and a set of foreground layers, where the back-
ground captures stationary elements while the foreground
layers capture moving objects along with their associated
effects (e.g. shadows and reflections). Our approach is de-
signed for unconstrained monocular videos, with arbitrary
camera and object motion. Prior work that tackles this
problem assumes that the video can be mapped onto a fixed
2D canvas, severely limiting the possible space of camera
motion. Instead, our method applies recent progress in
monocular camera pose and depth estimation to create a
full, RGBD video layer for the background, along with a
video layer for each foreground object. To solve the under-
constrained decomposition problem, we propose a new loss
formulation based on multi-view consistency. We test our
method on challenging videos with complex camera motion
and show significant qualitative improvement over current
methods.

Research Areas