Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning
Abstract
This paper presents KeypointNet, an end-to-end geometric reasoning framework to
learn an optimal set of category-specific 3D keypoints, along with their detectors.
Given a single image, KeypointNet extracts 3D keypoints that are optimized for
a downstream task. We demonstrate this framework on 3D pose estimation by
proposing a differentiable objective that seeks the optimal set of keypoints for
recovering the relative pose between two views of an object. Our model discovers
geometrically and semantically consistent keypoints across viewing angles and
instances of an object category. Importantly, we find that our end-to-end framework
using no ground-truth keypoint annotations outperforms a fully supervised baseline
using the same neural network architecture on the task of pose estimation. The
discovered 3D keypoints on the car, chair, and plane categories of ShapeNet are
visualized at keypointnet.github.io.
learn an optimal set of category-specific 3D keypoints, along with their detectors.
Given a single image, KeypointNet extracts 3D keypoints that are optimized for
a downstream task. We demonstrate this framework on 3D pose estimation by
proposing a differentiable objective that seeks the optimal set of keypoints for
recovering the relative pose between two views of an object. Our model discovers
geometrically and semantically consistent keypoints across viewing angles and
instances of an object category. Importantly, we find that our end-to-end framework
using no ground-truth keypoint annotations outperforms a fully supervised baseline
using the same neural network architecture on the task of pose estimation. The
discovered 3D keypoints on the car, chair, and plane categories of ShapeNet are
visualized at keypointnet.github.io.