Douglas Eck
Doug is a Senior Research Director at Google, and leads research efforts at Google DeepMind in Generative Media, including image, video, 3D, music and audio generation. He also leads a broader group active in areas including Fundamental Learning Algorithms, Natural Language Processing, Multimodal Learning, Reinforcement Learning, Computer Vision and Generative Models. His own research lies at the intersection of machine learning and human-computer interaction (HCI). Doug created Magenta, an ongoing research project exploring the role of AI in art and music creation. He is also an advocate for PAIR, a multidisciplinary team that explores the human side of AI through fundamental research, building tools, creating design frameworks, and working with diverse communities.
Before joining Google in 2010, Doug did research in music perception, aspects of music performance, machine learning for large audio datasets and music recommendation. He completed his PhD in Computer Science and Cognitive Science at Indiana University in 2000 and went on to a postdoctoral fellowship with Juergen Schmidhuber at IDSIA in Lugano Switzerland. From 2003-2010, Doug was faculty in Computer Science in the University of Montreal machine learning group (now MILA machine learning lab), where he became Associate Professor.
Authored Publications
Sort By
Deduplicating Training Data Makes Language Models Better
Andrew Nystrom
Chiyuan Zhang
Chris Callison-Burch
(2022) (to appear)
Preview abstract
As large language models scale up, researchers and engineers have chosen to use larger datasets of loosely-filtered internet text instead of curated texts.
We find that existing NLP datasets are highly repetitive and contain duplicated examples.
For example, there is an example in the training dataset C4 that has over 200,000 near duplicates.
As a whole, we find that 1.68% of the C4 are near-duplicates.
Worse, we find a 1% overlap between the training and testing sets in these datasets.
Duplicate examples in training data inappropriately biases the distribution of rare/common sequences.
Models trained with non-deduplicated datasets are more likely to generate ``memorized" examples.
Additionally, if those models are used for downstream applications, such as scoring likelihoods of given sequences, we find that models trained on non-deduplicated and deduplicated datasets have a difference in accuracy of on average TODO.
View details
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Hyung Won Chung
Sebastian Gehrmann
Parker Schuh
Sasha Tsvyashchenko
Abhishek Rao
Yi Tay
Noam Shazeer
Nan Du
Reiner Pope
James Bradbury
Guy Gur-Ari
Toju Duke
Henryk Michalewski
Xavier Garcia
Liam Fedus
David Luan
Barret Zoph
Ryan Sepassi
David Dohan
Shivani Agrawal
Mark Omernick
Marie Pellat
Aitor Lewkowycz
Erica Moreira
Rewon Child
Oleksandr Polozov
Zongwei Zhou
Brennan Saeta
Michele Catasta
Jason Wei
arxiv:2204.02311 (2022)
Preview abstract
Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies.
View details
Emergent Social Learning via Multi-agent Reinforcement Learning
Kamal Ndousse
Sergey Levine
International Conference on Machine Learning (ICML) (2021)
Joint Attention for Multi-Agent Coordination and Social Learning
Dennis Lee
Jiaxing Wu
ICRA Workshop on Social Intelligence in Humans and Robots (2021)
Preview abstract
Joint attention — the ability to purposefully coordinate your attention with another person, and mutually attend to the same thing — is an important milestone in human cognitive development. In this paper, we ask whether joint attention can be useful as a mechanism for improving multi-agent coordination and social learning. We first develop deep reinforcement learning (RL) agents with a recurrent visual attention architecture. We then train agents to minimize the difference between the attention weights that they apply to the environment at each timestep, and the attention of other agents. Our results show that this joint attention incentive improves agents’ ability to solve difficult coordination tasks, by helping overcome the problem of exploring the combinatorial multi-agent action space. Joint attention leads to higher performance than a competitive centralized critic baseline across multiple environments. Further, we show that joint attention enhances agents’ ability to learn from experts present in their environment, even when performing single-agent tasks. Taken together, these findings suggest that joint attention may be a useful inductive bias for improving multi-agent learning.
View details
Towards Better Storylines with Sentence-Level Language Models
David Grangier
Chris Callison-Burch
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020), pp. 1808-1822
Preview abstract
This work proposes a sentence-level language model which predicts
the next sentence in a story given the embeddings of the previous
sentences. The model operates at the sentence-level and selects the
next sentence within a fine set of fluent alternatives. By working
with sentence embeddings instead of word embeddings, our model is
able to efficiently consider a large number of alternative sentences.
By considering only fluent sentences, our model is relieved from modeling
fluency and can focus on longer range dependencies. Our method achieves
state-of-the-art accuracy on the StoryCloze task in the unsupervised setting.
View details
Learning via Social Awareness: Improving a Deep Generative Sketching Model with Facial Feedback
Jennifer McCleary
David Ha
Fred Bertsch
Rosalind Picard
International Joint Conference on Artificial Intelligence (IJCAI) 2018 (2020), pp. 1-9
Preview abstract
A known deficit of modern machine learning (ML) and deep learning (DL) methodology
is that models must be carefully fine-tuned in order to solve a particular task. Most
algorithms cannot generalize well to even highly similar tasks, let alone exhibit signs of
general artificial intelligence (AGI). To address this problem, researchers have explored
developing loss functions that act as intrinsic motivators that could motivate an ML or
DL agent to learn across a number of domains. This paper argues that an important
and useful intrinsic motivator is that of social interaction. We posit that making an AI
agent aware of implicit social feedback from humans can allow for faster learning of more
generalizable and useful representations, and could potentially impact AI safety. We collect
social feedback in the form of facial expression reactions to samples from Sketch RNN, an
LSTM-based variational autoencoder (VAE) designed to produce sketch drawings. We
use a Latent Constraints GAN (LC-GAN) to learn from the facial feedback of a small
group of viewers, by optimizing the model to produce sketches that it predicts will lead
to more positive facial expressions. We show in multiple independent evaluations that
the model trained with facial feedback produced sketches that are more highly rated, and
induce significantly more positive facial expressions. Thus, we establish that implicit social
feedback can improve the output of a deep learning model.
View details
Automatic Detection of Generated Text is Easiest when Humans are Fooled
Chris Callison-Burch
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020), pp. 1808-1822
Preview abstract
Recent advancements in neural language modelling make it possible to rapidly generate vast amounts of human-sounding text. The capabilities of humans and automatic discriminators to detect machine-generated text have been a large source of research interest, but humans and machines rely on different cues to make their decisions. Here, we perform careful benchmarking and analysis of three popular sampling-based decoding strategies—top-_k_, nucleus sampling, and untruncated random sampling—and show that improvements in decoding methods have primarily optimized for fooling humans. This comes at the expense of introducing statistical abnormalities that make detection easy for automatic systems. We also show that though both human and automatic detector performance improve with longer excerpt length, even multi-sentence excerpts can fool expert human raters over 30% of the time. Our findings reveal the importance of using both human and automatic detectors to assess the humanness of text generation systems.
View details
Preview abstract
We explore models for translating abstract musical ideas (scores, rhythms) into expressive performances using seq2seq and recurrent variational information bottleneck (VIB) models. Though seq2seq models usually require painstakingly aligned corpora, we show that it is possible to adapt an approach from the Generative Adversarial Network (GAN) literature (e.g. Pix2Pix, Vid2Vid) to sequences, creating large volumes of paired data by performing simple transformations and training generative models to plausibly invert these transformations. Music, and drumming in particular, provides a strong test case for this approach because many common transformations (quantization, removing voices) have clear semantics, and learning to invert them has real-world applications. Focusing on the case of drum set players, we create and release a new dataset for this purpose, containing over 13 hours of recordings by professional drummers aligned with fine-grained timing and dynamics information. We also explore some of the creative potential of these models, demonstrating improvements on state-of-the-art methods for Humanization (instantiating a performance from a musical score).
View details
Unsupervised Hierarchical Story Infilling
David Grangier
Chris Callison-Burch
NAACL 2019 Workshop on Narrative Understanding, Minneapolis, MN (2019)
Preview abstract
Story infilling involves predicting words to go into a missing span from a story.
This challenging task has the potential to transform interactive tools for creative writing.
However, state-of-the-art conditional language models have trouble balancing fluency and coherence with novelty and diversity. We address this limitation with a hierarchical model which first selects a set of rare words and then generates text conditioned on that set. By relegating the high entropy task of picking rare words to a word-sampling model, the second-stage model conditioned on those words can achieve high fluency and coherence by searching for likely sentences, without sacrificing diversity.
View details
Magenta Studio: Augmenting Creativity with Deep Learning in Ableton Live
Yotam Mann
Jon Gillick
Monica Dinculescu
Carey Radebaugh
Curtis Hawthorne
Proceedings of the International Workshop on Musical Metacreation (MUME) (2019)
Preview abstract
The field of Musical Metacreation (MuMe) has pro-duced impressive results for both autonomous and in-teractive creativity. However, there are few examplesof these systems crossing over to the “mainstream” ofmusic creation and consumption. We tie together ex-isting frameworks (Electron, TensorFlow.js, and MaxFor Live) to develop a system whose purpose is tobring the promise of interactive MuMe to the realmof professional music creators. Combining compellingapplications of deep learning based music generationwith a focus on ease of installation and use in a pop-ular DAW, we hope to expose more musicians and pro-ducers to the potential of using such systems in theircreative workflows. Our suite of plug-ins for AbletonLive, named Magenta Studio, is available for downloadathttp://g.co/magenta/studioalong with itsopen source implementation.
View details