Towards Learning a Universal Non-Semantic Representation of Speech

Joel Shor
Ronnie Zvi Maor
Omry Tuval
Marco Tagliasacchi
Ira Shavitt
Proc. Interspeech 2020 (2020)

Abstract

The ultimate goal of transfer learning is to enable learning with a small amount of data, by using a strong embedding. While significant progress has been made in the visual and language domains, the speech domain does not have such a universal method. This paper presents a new representation of speech signals based on an unsupervised triplet-loss objective, which outperforms both existing state of the art and other representations on a number of transfer learning tasks in the non-semantic speech domain. The embedding is learned on a publicly available dataset, and it is tested on a variety of low-resource downstream tasks, including personalization tasks and medical domain. The model will be publicly released.