Symbolic Music Generation with Diffusion Models

Gautam Mittal; Jesse Engel; Curtis Glenn-Macway Hawthorne; Ian Simon

Symbolic Music Generation with Diffusion Models

Gautam Mittal

Jesse Engel

Curtis Glenn-Macway Hawthorne

Ian Simon

ISMIR 2021 (2021) (to appear)

Download Google Scholar

Abstract

Score-based generative models and diffusion probabilistic models have been successful at generating high-quality samples in continuous domains such as images and audio. However, due to their Langevin-inspired sampling mechanisms, their application to discrete and sequential data has been limited. In this work, we present a technique for training diffusion models on sequential data by parameterizing the discrete domain in the continuous latent space of a pre-trained variational autoencoder. Our method is non-autoregressive and learns to generate sequences of latent embeddings through the reverse process of a Markov chain and offers parallel generation with a constant number of iterative refinement steps. We apply this technique to modeling symbolic music and show promising unconditional generation results compared to an autoregressive language model operating over the same continuous embeddings.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Symbolic Music Generation with Diffusion Models

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs