Unsupervised Discovery of Parts, Structure, and Dynamics

Zhenjia Xu
Zhijian Liu
Josh Tenenbaum
Jiajun Wu
ICLR (2019)

Abstract

Humans easily recognize object parts and their hierarchical structure by watching
how they move; they can then predict how each part moves in the future. In this
paper, we propose a novel formulation that simultaneously learns a hierarchical,
disentangled object representation and a dynamics model for object parts from
unlabeled videos. Our Parts, Structure, and Dynamics (PSD) model learns to,
first, recognize the object parts via a layered image representation; second, predict
hierarchy via a structural descriptor that composes low-level concepts into a
hierarchical structure; and third, model the system dynamics by predicting the
future. Experiments on multiple real and synthetic datasets demonstrate that our
PSD model works well on all three tasks: segmenting object parts, building their
hierarchical structure, and capturing their motion distributions.