Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset

Curtis Hawthorne; Andrew Stasyuk; Adam Roberts; Ian Simon; Anna Huang; Sander Dieleman; Erich Elsen; Jesse Engel; Douglas Eck

Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset

Curtis Hawthorne

Andrew Stasyuk

Adam Roberts

Ian Simon

Anna Huang

Sander Dieleman

Erich Elsen

Jesse Engel

Douglas Eck

ICLR (2019)

Download Google Scholar

Abstract

Generating musical audio directly with neural networks is notoriously difficult because it requires coherently modeling both long- and short-term structure. Fortunately, most music is also highly structured and primarily composed of discrete note events played on musical instruments. Herein, we show that by using notes as an intermediate representation, we can train a suite of models capable of transcribing, composing, and synthesizing audio waveforms with coherent musical structure on timescales spanning six orders of magnitude (~0.01 ms (8 kHz) to ~100 s). This large advance in the state of the art is enabled by our release of the new MAESTRO (MIDI and Audio Edited for Synchronous TRacks and Organization) dataset, composed of over 172 hours of virtuosic piano performances captured with fine alignment (~3 ms) between note labels and audio waveforms. The networks and the dataset together present a promising approach toward creating new expressive and interpretable neural models of music.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs