Teacher's pet: understanding and mitigating biases in distillation

Michal Lukasik; Srinadh Bhojanapalli; Aditya Krishna Menon; Sanjiv Kumar

Teacher's pet: understanding and mitigating biases in distillation

Michal Lukasik

Srinadh Bhojanapalli

Aditya Krishna Menon

Sanjiv Kumar

Transactions on Machine Learning Research (2022)

Google Scholar

Abstract

Knowledge distillation is widely used as a means of improving the performance of a relatively simple student model using the predictions from a complex teacher model.
Several works have shown that distillation significantly boosts the student's overall performance;
however, are these gains uniform across all data subgroups?
In this paper, we
show that
distillation can harm performance on
certain subgroups,
e.g., classes with few associated samples, compared to the vanilla student trained using the one-hot labels.
We trace this behaviour to errors made by the teacher distribution being transferred to and amplified by the student model.
To mitigate this problem,
we present techniques
which soften the teacher influence for subgroups where it is less reliable.
Experiments on several image classification benchmarks show that these modifications of distillation maintain boost in overall accuracy,
while additionally ensuring improvement in subgroup performance.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Teacher's pet: understanding and mitigating biases in distillation

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs