Google Research

Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows

European Conference on Computer Vision (ECCV) (2020), pp. 465-481


Monocular 3D human pose and shape estimation is challenging due to the many degrees of freedom of the human body and thedifficulty to acquire training data for large-scale supervised learning incomplex visual scenes. In this paper we present practical semi-supervisedand self-supervised models that support training and good generalizationin real-world images and video. Our formulation is based on kinematiclatent normalizing flow representations and dynamics, as well as differ-entiable, semantic body part alignment loss functions that support self-supervised learning. In extensive experiments using 3D motion capturedatasets like CMU, Human3.6M, 3DPW, or AMASS, as well as imagerepositories like COCO, we show that the proposed methods outperformthe state of the art, supporting the practical construction of an accuratefamily of models based on large-scale training with diverse and incom-pletely labeled image and video data.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work