Whitening and second order optimization both destroy information about the dataset, and can make generalization impossible

Daniel Duckworth

Ethan S Dyer

Jascha Sohl-dickstein

Neha Wadia

Sam S. Schoenholz

ICML Spotlight (2021) (to appear)

Download Google Scholar

Abstract

We show theoretically and experimentally that both data whitening and second order optimization erase information about the training dataset, and can prevent any generalization for high dimensional datasets. First we show that if the input layer of a model is a dense linear layer, then the datapoint-datapoint second moment matrix contains all information which can be used to make predictions. Second, we show that for high dimensional datasets, where the number of features is at least as large as the number of datapoints, and where the whitening transform is computed on the full (train+test) dataset, whitening erases all information in this datapoint-datapoint second moment matrix. Generalization is thus completely impossible for models trained on high dimensional whitened datasets. Second order optimization of a linear model is identical to first order optimization of the same model after data whitening. Second order optimization can thus also prevent any generalization in similar situations. We experimentally verify these predictions for models trained on whitened data, and for linear models trained with an online Newton optimizer. We further experimentally demonstrate that generalization continues to be harmed even when the theoretical constraints on input dimensionality (for whitening), or linearity of the model (for second order optimization) are relaxed.

Research Areas

Machine Intelligence

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Whitening and second order optimization both destroy information about the dataset, and can make generalization impossible

Abstract

Research Areas

Meet the teams driving innovation