Neighbourhood Distillation: On the benefits of non end-to-end distillation

Laëtitia Shao; Max Moroz; Elad Eban; Yair Movshovitz-Attias

Neighbourhood Distillation: On the benefits of non end-to-end distillation

Laëtitia Shao

Max Moroz

Elad Eban

Yair Movshovitz-Attias

arix (2020)

Google Scholar

Abstract

Knowledge Distillation is a popular method to reduce model size by transferring the knowledge of a large teacher model to a smaller student network. We show that it is possible to independently replace sub-parts of a network without accuracy loss. Based on this, we propose a distillation method that breaks the end-to-end paradigm by splitting the teacher architecture into smaller sub-networks - also called neighbourhoods. For each neighbourhood we distill a student independently and then merge them into a single student model. We show that this process is significantly faster than Knowledge Distillation, and produces students of the same quality.
From Neighbourhood Distillation, we design Student Search, an architecture search that leverages the independently distilled candidates to explore an exponentially large search space of architectures and locally selects the best candidate to use for the student model.
We show applications of Neighbourhood Distillation and Student Search on CIFAR-10 and ImageNet models on model reduction and sparsification problems. Our method offers up to $4.6\times$ speed-up compared to end-to-end distillation methods while retaining the same performance.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Neighbourhood Distillation: On the benefits of non end-to-end distillation

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs