Semantically Meaningful Attributes from Cowatch Embeddings for Playlist Exploration and Expansion
Abstract
Audio embeddings for musical similarity are often used for autoplay discovery. These embeddings are typically learned using co-listen data to train a deep neural network, to provide consistent triplet-loss distances. Instead of directly using the co-listen–based embeddings, we create an embedding space by training classifiers for attributes that describe music in human terms. This attribute-embedding space allows us to we provide recommendations, for use by music curators, that are less likely to be completely unintelligible. Each attribute used in this embedding space is built on top of the co-listen–based embeddings, sometimes with additional inputs for other meta-data. We examine the relative performance of these two embedding spaces (the co-listen audio embedding and the attribute embedding) for the mathematical separation of thematic playlists. We also report on the usefulness of recommendations from the attribute-embedding space to human curators.