Douglas Eck
Doug is a Senior Research Director at Google, and leads research efforts at Google DeepMind in Generative Media, including image, video, 3D, music and audio generation. He also leads a broader group active in areas including Fundamental Learning Algorithms, Natural Language Processing, Multimodal Learning, Reinforcement Learning, Computer Vision and Generative Models. His own research lies at the intersection of machine learning and human-computer interaction (HCI). Doug created Magenta, an ongoing research project exploring the role of AI in art and music creation. He is also an advocate for PAIR, a multidisciplinary team that explores the human side of AI through fundamental research, building tools, creating design frameworks, and working with diverse communities.
Before joining Google in 2010, Doug did research in music perception, aspects of music performance, machine learning for large audio datasets and music recommendation. He completed his PhD in Computer Science and Cognitive Science at Indiana University in 2000 and went on to a postdoctoral fellowship with Juergen Schmidhuber at IDSIA in Lugano Switzerland. From 2003-2010, Doug was faculty in Computer Science in the University of Montreal machine learning group (now MILA machine learning lab), where he became Associate Professor.
Authored Publications
Sort By
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Hyung Won Chung
Sebastian Gehrmann
Parker Schuh
Sasha Tsvyashchenko
Abhishek Rao
Yi Tay
Noam Shazeer
Nan Du
Reiner Pope
James Bradbury
Guy Gur-Ari
Toju Duke
Henryk Michalewski
Xavier Garcia
Liam Fedus
David Luan
Barret Zoph
Ryan Sepassi
David Dohan
Shivani Agrawal
Mark Omernick
Marie Pellat
Aitor Lewkowycz
Erica Moreira
Rewon Child
Oleksandr Polozov
Zongwei Zhou
Brennan Saeta
Michele Catasta
Jason Wei
Kathy Meier-Hellstern
arxiv:2204.02311 (2022)
Preview abstract
Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies.
View details
Deduplicating Training Data Makes Language Models Better
Andrew Nystrom
Chiyuan Zhang
Chris Callison-Burch
(2022) (to appear)
Preview abstract
As large language models scale up, researchers and engineers have chosen to use larger datasets of loosely-filtered internet text instead of curated texts.
We find that existing NLP datasets are highly repetitive and contain duplicated examples.
For example, there is an example in the training dataset C4 that has over 200,000 near duplicates.
As a whole, we find that 1.68% of the C4 are near-duplicates.
Worse, we find a 1% overlap between the training and testing sets in these datasets.
Duplicate examples in training data inappropriately biases the distribution of rare/common sequences.
Models trained with non-deduplicated datasets are more likely to generate ``memorized" examples.
Additionally, if those models are used for downstream applications, such as scoring likelihoods of given sequences, we find that models trained on non-deduplicated and deduplicated datasets have a difference in accuracy of on average TODO.
View details
Emergent Social Learning via Multi-agent Reinforcement Learning
Kamal Ndousse
Sergey Levine
International Conference on Machine Learning (ICML) (2021)
Joint Attention for Multi-Agent Coordination and Social Learning
Dennis Lee
Jiaxing Wu
ICRA Workshop on Social Intelligence in Humans and Robots (2021)
Preview abstract
Joint attention — the ability to purposefully coordinate your attention with another person, and mutually attend to the same thing — is an important milestone in human cognitive development. In this paper, we ask whether joint attention can be useful as a mechanism for improving multi-agent coordination and social learning. We first develop deep reinforcement learning (RL) agents with a recurrent visual attention architecture. We then train agents to minimize the difference between the attention weights that they apply to the environment at each timestep, and the attention of other agents. Our results show that this joint attention incentive improves agents’ ability to solve difficult coordination tasks, by helping overcome the problem of exploring the combinatorial multi-agent action space. Joint attention leads to higher performance than a competitive centralized critic baseline across multiple environments. Further, we show that joint attention enhances agents’ ability to learn from experts present in their environment, even when performing single-agent tasks. Taken together, these findings suggest that joint attention may be a useful inductive bias for improving multi-agent learning.
View details
Automatic Detection of Generated Text is Easiest when Humans are Fooled
Chris Callison-Burch
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020), pp. 1808-1822
Preview abstract
Recent advancements in neural language modelling make it possible to rapidly generate vast amounts of human-sounding text. The capabilities of humans and automatic discriminators to detect machine-generated text have been a large source of research interest, but humans and machines rely on different cues to make their decisions. Here, we perform careful benchmarking and analysis of three popular sampling-based decoding strategies—top-_k_, nucleus sampling, and untruncated random sampling—and show that improvements in decoding methods have primarily optimized for fooling humans. This comes at the expense of introducing statistical abnormalities that make detection easy for automatic systems. We also show that though both human and automatic detector performance improve with longer excerpt length, even multi-sentence excerpts can fool expert human raters over 30% of the time. Our findings reveal the importance of using both human and automatic detectors to assess the humanness of text generation systems.
View details
Towards Better Storylines with Sentence-Level Language Models
David Grangier
Chris Callison-Burch
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020), pp. 1808-1822
Preview abstract
This work proposes a sentence-level language model which predicts
the next sentence in a story given the embeddings of the previous
sentences. The model operates at the sentence-level and selects the
next sentence within a fine set of fluent alternatives. By working
with sentence embeddings instead of word embeddings, our model is
able to efficiently consider a large number of alternative sentences.
By considering only fluent sentences, our model is relieved from modeling
fluency and can focus on longer range dependencies. Our method achieves
state-of-the-art accuracy on the StoryCloze task in the unsupervised setting.
View details
Learning via Social Awareness: Improving a Deep Generative Sketching Model with Facial Feedback
Jennifer McCleary
David Ha
Fred Bertsch
Rosalind Picard
International Joint Conference on Artificial Intelligence (IJCAI) 2018 (2020), pp. 1-9
Preview abstract
A known deficit of modern machine learning (ML) and deep learning (DL) methodology
is that models must be carefully fine-tuned in order to solve a particular task. Most
algorithms cannot generalize well to even highly similar tasks, let alone exhibit signs of
general artificial intelligence (AGI). To address this problem, researchers have explored
developing loss functions that act as intrinsic motivators that could motivate an ML or
DL agent to learn across a number of domains. This paper argues that an important
and useful intrinsic motivator is that of social interaction. We posit that making an AI
agent aware of implicit social feedback from humans can allow for faster learning of more
generalizable and useful representations, and could potentially impact AI safety. We collect
social feedback in the form of facial expression reactions to samples from Sketch RNN, an
LSTM-based variational autoencoder (VAE) designed to produce sketch drawings. We
use a Latent Constraints GAN (LC-GAN) to learn from the facial feedback of a small
group of viewers, by optimizing the model to produce sketches that it predicts will lead
to more positive facial expressions. We show in multiple independent evaluations that
the model trained with facial feedback produced sketches that are more highly rated, and
induce significantly more positive facial expressions. Thus, we establish that implicit social
feedback can improve the output of a deep learning model.
View details
Magenta Studio: Augmenting Creativity with Deep Learning in Ableton Live
Yotam Mann
Jon Gillick
Monica Dinculescu
Carey Radebaugh
Curtis Hawthorne
Proceedings of the International Workshop on Musical Metacreation (MUME) (2019)
Preview abstract
The field of Musical Metacreation (MuMe) has pro-duced impressive results for both autonomous and in-teractive creativity. However, there are few examplesof these systems crossing over to the “mainstream” ofmusic creation and consumption. We tie together ex-isting frameworks (Electron, TensorFlow.js, and MaxFor Live) to develop a system whose purpose is tobring the promise of interactive MuMe to the realmof professional music creators. Combining compellingapplications of deep learning based music generationwith a focus on ease of installation and use in a pop-ular DAW, we hope to expose more musicians and pro-ducers to the potential of using such systems in theircreative workflows. Our suite of plug-ins for AbletonLive, named Magenta Studio, is available for downloadathttp://g.co/magenta/studioalong with itsopen source implementation.
View details
Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset
Curtis Hawthorne
Andrew Stasyuk
Sander Dieleman
Erich Elsen
ICLR (2019)
Preview abstract
Generating musical audio directly with neural networks is notoriously difficult because it requires coherently modeling both long- and short-term structure. Fortunately, most music is also highly structured and primarily composed of discrete note events played on musical instruments. Herein, we show that by using notes as an intermediate representation, we can train a suite of models capable of transcribing, composing, and synthesizing audio waveforms with coherent musical structure on timescales spanning six orders of magnitude (~0.01 ms (8 kHz) to ~100 s). This large advance in the state of the art is enabled by our release of the new MAESTRO (MIDI and Audio Edited for Synchronous TRacks and Organization) dataset, composed of over 172 hours of virtuosic piano performances captured with fine alignment (~3 ms) between note labels and audio waveforms. The networks and the dataset together present a promising approach toward creating new expressive and interpretable neural models of music.
View details
Music Transformer: Generating Music with Long-Term Structure
Ashish Vaswani
Jakob Uszkoreit
Noam Shazeer
Curtis Hawthorne
Matt Hoffman
Monica Dinculescu
ICLR (2019)
Preview abstract
Music relies heavily on repetition to build structure and meaning. Self-reference occurs on multiple timescales, from motifs to phrases to reusing of entire sections of music, such as in pieces with ABA structure. The Transformer (Vaswani et al., 2017), a sequence model based on self-attention, has achieved compelling results in many generation tasks that require maintaining long-range coherence. This suggests that self-attention might also be well-suited to modeling music. In musical composition and performance, however, relative timing is critically important. Existing approaches for representing relative positional information in the Transformer modulate attention based on pairwise distance (Shaw et al., 2018). This is impractical for long sequences such as musical compositions since their memory complexity for intermediate relative information is quadratic in the sequence length. We propose an algorithm that reduces their intermediate memory requirement to linear in the sequence length. This enables us to demonstrate that a Transformer with our modified relative attention mechanism can generate minute-long compositions (thousands of steps, four times the length modeled in Oore et al., 2018) with compelling structure, generate continuations that coherently elaborate on a given motif, and in a seq2seq setup generate accompaniments conditioned on melodies. We evaluate the Transformer with our relative attention mechanism on two datasets, JSB Chorales and Piano-e-Competition, and obtain state-of-the-art results on the latter.
View details