Learning Latent Representations of Music to Generate Interactive Musical Palettes
Abstract
Advances in machine learning have the potential to radically
reshape interactions between humans and computers. Deep
learning makes it possible to discover powerful representations
that are capable of capturing the latent structure of highdimensional
data such as music. By creating interactive latent
space “palettes” of musical sequences and timbres, we
demonstrate interfaces for musical creation made possible
by machine learning. We introduce an interface to the intuitive,
low-dimensional control spaces for high-dimensional
note sequences, allowing users to explore a compositional
space of melodies or drum beats in a simple 2-D grid. Furthermore,
users can define 1-D trajectories in the 2-D space
for autonomous, continuous morphing during improvisation.
Similarly for timbre, our interface to a learned latent space
of audio provides an intuitive and smooth search space for
morphing between the timbres of different instruments. We
remove technical and computational barriers by embedding
pre-trained networks into a browser-based GPU-accelerated
framework, making the systems accessible to a wide range of
users while maintaining potential for creative flexibility and
personalization.
reshape interactions between humans and computers. Deep
learning makes it possible to discover powerful representations
that are capable of capturing the latent structure of highdimensional
data such as music. By creating interactive latent
space “palettes” of musical sequences and timbres, we
demonstrate interfaces for musical creation made possible
by machine learning. We introduce an interface to the intuitive,
low-dimensional control spaces for high-dimensional
note sequences, allowing users to explore a compositional
space of melodies or drum beats in a simple 2-D grid. Furthermore,
users can define 1-D trajectories in the 2-D space
for autonomous, continuous morphing during improvisation.
Similarly for timbre, our interface to a learned latent space
of audio provides an intuitive and smooth search space for
morphing between the timbres of different instruments. We
remove technical and computational barriers by embedding
pre-trained networks into a browser-based GPU-accelerated
framework, making the systems accessible to a wide range of
users while maintaining potential for creative flexibility and
personalization.