Jump to Content

TRILLSSON: DISTILLING UNIVERSAL PARALINGUISTIC SPEECH REPRESENTATIONS

Interspeech 2022 (2022)
Google Scholar

Abstract

Recent advances in self-supervision have dramatically im- proved the quality of speech representations. However, wide deployment of state-of-the-art embedding models on devices has been severely restricted due to their limited public avail- ability and large resource footprint. Our work addresses these by publicly releasing a collection of paralinguistic speech models1 that are small, near state-of-the-art performance. Our approach is based on knowledge distillation, and our models are distilled only on public data. We explore differ- ent architectures and thoroughly evaluate our models on the Non-Semantic Speech (NOSS) benchmark. Our largest dis- tilled model is less than 16% the size of the original model (340MB vs 2.2GB) and achieves over 94% the accuracy on 6 of 7 tasks. The smallest model is less than 0.3% in size (22MB) and achieves over 90% as the accuracy on 6 of 7 tasks.

Research Areas