Unsupervised Discovery of Parts, Structure, and Dynamics
Abstract
Humans easily recognize object parts and their hierarchical structure by watching
how they move; they can then predict how each part moves in the future. In this
paper, we propose a novel formulation that simultaneously learns a hierarchical,
disentangled object representation and a dynamics model for object parts from
unlabeled videos. Our Parts, Structure, and Dynamics (PSD) model learns to,
first, recognize the object parts via a layered image representation; second, predict
hierarchy via a structural descriptor that composes low-level concepts into a
hierarchical structure; and third, model the system dynamics by predicting the
future. Experiments on multiple real and synthetic datasets demonstrate that our
PSD model works well on all three tasks: segmenting object parts, building their
hierarchical structure, and capturing their motion distributions.
how they move; they can then predict how each part moves in the future. In this
paper, we propose a novel formulation that simultaneously learns a hierarchical,
disentangled object representation and a dynamics model for object parts from
unlabeled videos. Our Parts, Structure, and Dynamics (PSD) model learns to,
first, recognize the object parts via a layered image representation; second, predict
hierarchy via a structural descriptor that composes low-level concepts into a
hierarchical structure; and third, model the system dynamics by predicting the
future. Experiments on multiple real and synthetic datasets demonstrate that our
PSD model works well on all three tasks: segmenting object parts, building their
hierarchical structure, and capturing their motion distributions.