Training-Free Neural Matte Extraction for Visual Effects
Abstract
Alpha matting is widely used in video conferencing as well as in movies, television, and online video publishing sites such as YouTube. Deep learning approaches to the matte extraction problem are well suited to video conferencing due to the relatively consistent subject (front-facing humans), however they are less appropriate for entertainment videos where varied subjects (spaceships, monsters, etc.) may appear only a few times. We introduce a \emph{one-shot} matte extraction approach that targets these applications. Our approach is based on the deep image prior, which optimizes a deep neural network to map a fixed random input to a single output, thereby providing a somewhat deep and hierarchical encoding of the particular image. We make use of the representations in the penultimate layer to interpolate coarse and incomplete "trimap" constraints. The algorithm is both very simple and surprisingly effective, though (in common with classic methods that solve large sparse linear systems) it is too slow for real-time or interactive use.