MidiMe: Personalizing a MusicVAE model with user data
Abstract
One of the areas of interest for music generative models is to empower individual expression. But how can a creator personalize a machine learning model to make it their own?
Training a custom deep neural network model like Music Transformer, MusicVAE or SketchRNN from scratch requires significant amounts of data (millions of examples) and compute resources (specialized hardware like GPUs/TPUs) as well as expertise in hyper parameter tuning. Without sufficient data, models are either unable to produce realistic output (underfitting), or they memorize the training examples and are unable to generalize to produce varied outputs (overfitting) – it would be like trying to learn all of music theory from a single song.
We introduce a new model for sample-efficient adaptation to user data, based on prior work by Engel et al [1]. We can quickly train this small, personalized model to control a much larger, more general pretrained latent variable model. This allows us to generate samples from only the portions of the latent space we are interested in without having to retrain the large model from scratch. We demonstrate this technique in an online demo, that lets users upload their own MIDI files (either melodies or multi-instrument songs) and generate samples that sound like their input.
Training a custom deep neural network model like Music Transformer, MusicVAE or SketchRNN from scratch requires significant amounts of data (millions of examples) and compute resources (specialized hardware like GPUs/TPUs) as well as expertise in hyper parameter tuning. Without sufficient data, models are either unable to produce realistic output (underfitting), or they memorize the training examples and are unable to generalize to produce varied outputs (overfitting) – it would be like trying to learn all of music theory from a single song.
We introduce a new model for sample-efficient adaptation to user data, based on prior work by Engel et al [1]. We can quickly train this small, personalized model to control a much larger, more general pretrained latent variable model. This allows us to generate samples from only the portions of the latent space we are interested in without having to retrain the large model from scratch. We demonstrate this technique in an online demo, that lets users upload their own MIDI files (either melodies or multi-instrument songs) and generate samples that sound like their input.