Google Research

Weighted distillation with unlabeled examples

NeurIPS 2022 (2022)

Abstract

Distillation with unlabeled examples is a popular and powerful method for training deep neural networks in settings where the amount of labeled data is limited: A large teacher'' neural network is trained on the labeled data available, and then it is used to generate labels on an unlabeled dataset (typically much larger in size). These labels are then utilized to train the smallerstudent'' model which will actually be deployed. The main drawback of the method is that the teacher often generates inaccurate labels, confusing the student. This paper proposes a principled approach for addressing this issue based on importance reweighting. Our method is hyper-parameter free, efficient, data-agnostic, and simple to implement, while it applies to both hard'' andsoft'' distillation. We accompany our results with a theoretical analysis which rigorously justifies the performance of our method in certain settings. Finally, we demonstrate significant improvements on popular academic datasets when compared to conventional (unweighted) distillation with unlabeled examples.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work