Improving Semantic Segmentation through Spatio-Temporal Consistency Learned from Videos

Ankita Pasad

Ariel Gordon

Tsung-Yi Lin

Anelia Angelova

CVPR 2020 Workshop on Learning from Unlabeled Videos (2020) (to appear)

Download Google Scholar

Abstract

We leverage unsupervised learning of depth, egomotion, and camera intrinsics to improve the performance of single-image semantic segmentation, by enforcing 3D-geometric and temporal consistency of segmentation masks across video frames. The predicted depth, egomotion, and camera intrinsics are used to provide an additional supervision signal to the segmentation model, significantly enhancing its quality, or, alternatively, reducing the number of labels the segmentation model needs. Our experiments were performed on the ScanNet dataset.

Research Areas

Machine Intelligence
Machine Perception

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Improving Semantic Segmentation through Spatio-Temporal Consistency Learned from Videos

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Improving Semantic Segmentation through Spatio-Temporal Consistency Learned from Videos

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities