Learning ABCs: Approximate Bijective Correspondence for isolating factors of variation with weak supervision
Abstract
Representational learning forms the backbone of most deep learning applications, and the value of a learned representation depends on its information content about the different factors of variation.
Learning good representations is intimately tied to the nature of supervision and the learning algorithm.
We propose a novel algorithm that relies on a weak form of supervision where the data is partitioned into sets according to certain \textit{inactive} factors of variation.
Our key insight is that by seeking approximate correspondence between elements of different sets, we learn strong representations that exclude the inactive factors of variation and isolate the \textit{active} factors which vary within all sets.
We demonstrate that the method can work in a semi-supervised scenario, and that a portion of the unsupervised data can belong to a different domain entirely, as long as the same active factors of variation are present.
By folding in data augmentation to suppress additional nuisance factors, we are able to further control the content of the learned representations.
We outperform competing baselines on the challenging problem of synthetic-to-real object pose transfer.
Learning good representations is intimately tied to the nature of supervision and the learning algorithm.
We propose a novel algorithm that relies on a weak form of supervision where the data is partitioned into sets according to certain \textit{inactive} factors of variation.
Our key insight is that by seeking approximate correspondence between elements of different sets, we learn strong representations that exclude the inactive factors of variation and isolate the \textit{active} factors which vary within all sets.
We demonstrate that the method can work in a semi-supervised scenario, and that a portion of the unsupervised data can belong to a different domain entirely, as long as the same active factors of variation are present.
By folding in data augmentation to suppress additional nuisance factors, we are able to further control the content of the learned representations.
We outperform competing baselines on the challenging problem of synthetic-to-real object pose transfer.