Self-Supervised Learning of Video-Induced Visual Invariances

Michael Tobias Tschannen

Josip Djolonga

Marvin Ritter

Aravindh Mahendran

Neil Houlsby

Sylvain Gelly

Mario Lučić

Conference on Computer Vision and Pattern Recognition (2020)

Download Google Scholar

Abstract

We propose a general framework for self-supervised learning of transferable visual representations based on Video-Induced Visual Invariances (VIVI). We make use of the natural hierarchy consisting of (i) frame level invariances (e.g. color and contrast robustness), (ii) shot/clip level invariances (e.g. robustness to changes in object orientation and lighting conditions), and (iii) video level invariances (semantic relationships of scenes across shots/clips) to define a holistic self-supervised loss. We train the proposed model on the YouTube-8M dataset and show that this approach leads to state-of-the-art self-supervised results on the 19 diverse downstream tasks of the Visual Task Adaptation Benchmark (VTAB). We then show how to co-train the model jointly with labeled images, outperforming an ImageNet-pretrained ResNet-50 with $10x$ fewer labeled images.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Self-Supervised Learning of Video-Induced Visual Invariances

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Self-Supervised Learning of Video-Induced Visual Invariances

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities