Bach Doodle: Approachable music composition with machine learning at scale

Curtis Hawthorne
Monica Dinculescu
Leon Hong
Jacob Howcroft
Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR) (2019)

Abstract

Many of us like music, but composing can feel intimidating, not knowing where to begin. Even when we have a melody, without sufficient skills in harmony we are deterred from developing it into a composition. Machine learning could potentially extend our creative abilities by offering generative models that can fill in the missing parts of our composition.

To make music composition more approachable, we designed a composition web-app where users can create their own melody and have it harmonized by a machine learning model. For inputting melodies, we designed a simplified sheet music interface that facilitates easy trial and error, and found that users adapted to it quickly even when they were not familiar with western music notation. Users can rapidly explore different possibilities in harmonizations by tweaking their melody and requesting for new harmonizations.

The harmonizations are provided by Coconet, a flexible generative model of counterpoint. Several technical challenges had to be overcome to support an interactive experience at scale. First, as most users do not have dedicated hardware to run machine learning models, we re-implemented Coconet in TensorFlow.js so that it could run in the browser. Second, our initial re-implementation took more than 40 seconds to generate two measures of music. By adopting dilated depth-wise separable convolutions and model quantization, we reduced it down to 2 seconds. Third, to prepare for large-scale deployment, we calibrated a speed test to determine if a user’s device is fast enough for running the model in the browser, if not the harmonization requests were sent to remote TPU servers.

In three days, the web-app received more than 50 million queries for harmonization around the world. Users could choose to rate their compositions and contribute them to a public dataset, which we are releasing with this paper. We hope that the community might find this dataset useful for, ranging from ethnomusicological studies, to music education to improving machine learning models. We end with a quote from a user: “It's really fun to play with. This might be the first time in my life I feel competent at music.”