Layerwise Bregman Representation Learning of Neural Networks with Applications to Knowledge Distillation

Ehsan Amid; Rohan Anil; Christopher Fifty; Manfred Warmuth

Layerwise Bregman Representation Learning of Neural Networks with Applications to Knowledge Distillation

Ehsan Amid

Rohan Anil

Christopher Fifty

Manfred Warmuth

Transactions on Machine Learning Research, 02/23 (2023)

Download Google Scholar

Abstract

We propose a new method for layerwise representation learning of a trained neural network that conforms to the non-linearity of the layer’s transfer function. In particular, we form a Bregman divergence based on the convex function induced by the layer’s transfer function and construct an extension of the original Bregman PCA formulation by incorporating a mean vector and revising the normalization constraint on the principal directions. These modifications allow exporting the learned representation as a fixed layer with a non-linearity. As an application to knowledge distillation, we cast the learning problem for the student network as predicting the compression coefficients of the teacher’s representations, which is then passed as the input to the imported layer. Our empirical findings indicate that our approach is substantially more effective for transferring information between networks than typical teacher-student training that uses the teacher’s soft labels.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Layerwise Bregman Representation Learning of Neural Networks with Applications to Knowledge Distillation

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs