On Mutual Information Maximization for Representation Learning

Michael Tobias Tschannen; Josip Djolonga; Paul Kishan Rubenstein; Sylvain Gelly; Mario Lučić

On Mutual Information Maximization for Representation Learning

Michael Tobias Tschannen

Josip Djolonga

Paul Kishan Rubenstein

Sylvain Gelly

Mario Lučić

International Conference on Learning Representations (2020)

Download Google Scholar

Abstract

Many recent methods for unsupervised or self-supervised representation learning train feature extractors by maximizing an estimate of the mutual information (MI) between different views of the data. This comes with several immediate problems: For example, MI is notoriously hard to estimate, and using it as an objective for representation learning may lead to highly entangled representations due to its invariance under arbitrary invertible transformations. Nevertheless, these methods have been repeatedly shown to excel in practice. In this paper we argue, and provide empirical evidence, that the success of these methods might be only loosely attributed to the properties of MI, and that they strongly depend on the inductive bias in both the choice of feature extractor architectures and the parametrization of the employed MI estimators. Finally, we establish a connection to deep metric learning and argue that this interpretation may be a plausible explanation for the success of the recently introduced methods.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

On Mutual Information Maximization for Representation Learning

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs