Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras

Ariel Gordon

Hanhan Li

Rico Jonschkowski

Anelia Angelova

The IEEE International Conference on Computer Vision (ICCV) (2019)

Download Google Scholar

Abstract

We present a novel method for simultaneously learning depth, egomotion, object motion, and camera intrinsics from monocular videos, using only consistency across neighboring video frames as a supervision signal. Similarly to prior work, our method learns by applying differentiable warping to frames and comparing the result to adjacent ones, but it provides several improvements: We address occlusions geometrically and differentiably, directly using the depth maps as predicted during training. We introduce randomized layer normalization, a novel regularizer, and we account for object motion relative to the scene.
To the best of our knowledge, our work is the first to learn the camera intrinsic parameters, including lens distortion, from video in an unsupervised manner, thereby allowing us to extract accurate depth and motion from arbitrary videos of unknown origin at scale. We evaluate our results on the Cityscapes, KITTI, and EuRoC MAV datasets, establishing new state of the art on depth prediction and odometry, and demonstrate qualitatively that depth prediction can be learned from a collection of YouTube videos. The code is publicly available at github.com/google-research/google-research/tree/master/depth_from_video_in_the_wild.

Research Areas

Machine perception

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs