Douglas Eck
Doug is a Senior Research Director at Google, and leads research efforts at Google DeepMind in Generative Media, including image, video, 3D, music and audio generation. He also leads a broader group active in areas including Fundamental Learning Algorithms, Natural Language Processing, Multimodal Learning, Reinforcement Learning, Computer Vision and Generative Models. His own research lies at the intersection of machine learning and human-computer interaction (HCI). Doug created Magenta, an ongoing research project exploring the role of AI in art and music creation. He is also an advocate for PAIR, a multidisciplinary team that explores the human side of AI through fundamental research, building tools, creating design frameworks, and working with diverse communities.
Before joining Google in 2010, Doug did research in music perception, aspects of music performance, machine learning for large audio datasets and music recommendation. He completed his PhD in Computer Science and Cognitive Science at Indiana University in 2000 and went on to a postdoctoral fellowship with Juergen Schmidhuber at IDSIA in Lugano Switzerland. From 2003-2010, Doug was faculty in Computer Science in the University of Montreal machine learning group (now MILA machine learning lab), where he became Associate Professor.
Authored Publications
Sort By
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Hyung Won Chung
Sebastian Gehrmann
Parker Schuh
Sasha Tsvyashchenko
Abhishek Rao
Yi Tay
Noam Shazeer
Nan Du
Reiner Pope
James Bradbury
Guy Gur-Ari
Toju Duke
Henryk Michalewski
Xavier Garcia
Liam Fedus
David Luan
Barret Zoph
Ryan Sepassi
David Dohan
Shivani Agrawal
Mark Omernick
Marie Pellat
Aitor Lewkowycz
Erica Moreira
Rewon Child
Oleksandr Polozov
Zongwei Zhou
Brennan Saeta
Michele Catasta
Jason Wei
Kathy Meier-Hellstern
arxiv:2204.02311 (2022)
Preview abstract
Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies.
View details
Deduplicating Training Data Makes Language Models Better
Andrew Nystrom
Chiyuan Zhang
Chris Callison-Burch
(2022) (to appear)
Preview abstract
As large language models scale up, researchers and engineers have chosen to use larger datasets of loosely-filtered internet text instead of curated texts.
We find that existing NLP datasets are highly repetitive and contain duplicated examples.
For example, there is an example in the training dataset C4 that has over 200,000 near duplicates.
As a whole, we find that 1.68% of the C4 are near-duplicates.
Worse, we find a 1% overlap between the training and testing sets in these datasets.
Duplicate examples in training data inappropriately biases the distribution of rare/common sequences.
Models trained with non-deduplicated datasets are more likely to generate ``memorized" examples.
Additionally, if those models are used for downstream applications, such as scoring likelihoods of given sequences, we find that models trained on non-deduplicated and deduplicated datasets have a difference in accuracy of on average TODO.
View details
Joint Attention for Multi-Agent Coordination and Social Learning
Dennis Lee
Jiaxing Wu
ICRA Workshop on Social Intelligence in Humans and Robots (2021)
Preview abstract
Joint attention — the ability to purposefully coordinate your attention with another person, and mutually attend to the same thing — is an important milestone in human cognitive development. In this paper, we ask whether joint attention can be useful as a mechanism for improving multi-agent coordination and social learning. We first develop deep reinforcement learning (RL) agents with a recurrent visual attention architecture. We then train agents to minimize the difference between the attention weights that they apply to the environment at each timestep, and the attention of other agents. Our results show that this joint attention incentive improves agents’ ability to solve difficult coordination tasks, by helping overcome the problem of exploring the combinatorial multi-agent action space. Joint attention leads to higher performance than a competitive centralized critic baseline across multiple environments. Further, we show that joint attention enhances agents’ ability to learn from experts present in their environment, even when performing single-agent tasks. Taken together, these findings suggest that joint attention may be a useful inductive bias for improving multi-agent learning.
View details
Emergent Social Learning via Multi-agent Reinforcement Learning
Kamal Ndousse
Sergey Levine
International Conference on Machine Learning (ICML) (2021)
Learning via Social Awareness: Improving a Deep Generative Sketching Model with Facial Feedback
Jennifer McCleary
David Ha
Fred Bertsch
Rosalind Picard
International Joint Conference on Artificial Intelligence (IJCAI) 2018 (2020), pp. 1-9
Preview abstract
A known deficit of modern machine learning (ML) and deep learning (DL) methodology
is that models must be carefully fine-tuned in order to solve a particular task. Most
algorithms cannot generalize well to even highly similar tasks, let alone exhibit signs of
general artificial intelligence (AGI). To address this problem, researchers have explored
developing loss functions that act as intrinsic motivators that could motivate an ML or
DL agent to learn across a number of domains. This paper argues that an important
and useful intrinsic motivator is that of social interaction. We posit that making an AI
agent aware of implicit social feedback from humans can allow for faster learning of more
generalizable and useful representations, and could potentially impact AI safety. We collect
social feedback in the form of facial expression reactions to samples from Sketch RNN, an
LSTM-based variational autoencoder (VAE) designed to produce sketch drawings. We
use a Latent Constraints GAN (LC-GAN) to learn from the facial feedback of a small
group of viewers, by optimizing the model to produce sketches that it predicts will lead
to more positive facial expressions. We show in multiple independent evaluations that
the model trained with facial feedback produced sketches that are more highly rated, and
induce significantly more positive facial expressions. Thus, we establish that implicit social
feedback can improve the output of a deep learning model.
View details
Automatic Detection of Generated Text is Easiest when Humans are Fooled
Chris Callison-Burch
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020), pp. 1808-1822
Preview abstract
Recent advancements in neural language modelling make it possible to rapidly generate vast amounts of human-sounding text. The capabilities of humans and automatic discriminators to detect machine-generated text have been a large source of research interest, but humans and machines rely on different cues to make their decisions. Here, we perform careful benchmarking and analysis of three popular sampling-based decoding strategies—top-_k_, nucleus sampling, and untruncated random sampling—and show that improvements in decoding methods have primarily optimized for fooling humans. This comes at the expense of introducing statistical abnormalities that make detection easy for automatic systems. We also show that though both human and automatic detector performance improve with longer excerpt length, even multi-sentence excerpts can fool expert human raters over 30% of the time. Our findings reveal the importance of using both human and automatic detectors to assess the humanness of text generation systems.
View details
Towards Better Storylines with Sentence-Level Language Models
David Grangier
Chris Callison-Burch
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020), pp. 1808-1822
Preview abstract
This work proposes a sentence-level language model which predicts
the next sentence in a story given the embeddings of the previous
sentences. The model operates at the sentence-level and selects the
next sentence within a fine set of fluent alternatives. By working
with sentence embeddings instead of word embeddings, our model is
able to efficiently consider a large number of alternative sentences.
By considering only fluent sentences, our model is relieved from modeling
fluency and can focus on longer range dependencies. Our method achieves
state-of-the-art accuracy on the StoryCloze task in the unsupervised setting.
View details
Preview abstract
We explore models for translating abstract musical ideas (scores, rhythms) into expressive performances using seq2seq and recurrent variational information bottleneck (VIB) models. Though seq2seq models usually require painstakingly aligned corpora, we show that it is possible to adapt an approach from the Generative Adversarial Network (GAN) literature (e.g. Pix2Pix, Vid2Vid) to sequences, creating large volumes of paired data by performing simple transformations and training generative models to plausibly invert these transformations. Music, and drumming in particular, provides a strong test case for this approach because many common transformations (quantization, removing voices) have clear semantics, and learning to invert them has real-world applications. Focusing on the case of drum set players, we create and release a new dataset for this purpose, containing over 13 hours of recordings by professional drummers aligned with fine-grained timing and dynamics information. We also explore some of the creative potential of these models, demonstrating improvements on state-of-the-art methods for Humanization (instantiating a performance from a musical score).
View details
Preview abstract
Dramatic advances in generative models have resulted in near photographic quality for artificially rendered faces, animals and other objects in the natural world. In spite of such advances, a higher level understanding of vision and imagery does not arise from exhaustively modeling an object, but instead identifying higher-level attributes that best summarize the aspects of an object. In this work we attempt to model the drawing process of fonts by building sequential generative models of vector graphics. This model has the benefit of providing a scale-invariant representation for imagery whose latent representation may be systematically manipulated and exploited to perform style propagation. We demonstrate these results on a large dataset of fonts crawled from the web and highlight how such a model captures the statistical dependencies and richness of this dataset. We envision that our model can find use as a tool for graphic designers to facilitate font design.
View details
Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset
Curtis Hawthorne
Andrew Stasyuk
Sander Dieleman
Erich Elsen
ICLR (2019)
Preview abstract
Generating musical audio directly with neural networks is notoriously difficult because it requires coherently modeling both long- and short-term structure. Fortunately, most music is also highly structured and primarily composed of discrete note events played on musical instruments. Herein, we show that by using notes as an intermediate representation, we can train a suite of models capable of transcribing, composing, and synthesizing audio waveforms with coherent musical structure on timescales spanning six orders of magnitude (~0.01 ms (8 kHz) to ~100 s). This large advance in the state of the art is enabled by our release of the new MAESTRO (MIDI and Audio Edited for Synchronous TRacks and Organization) dataset, composed of over 172 hours of virtuosic piano performances captured with fine alignment (~3 ms) between note labels and audio waveforms. The networks and the dataset together present a promising approach toward creating new expressive and interpretable neural models of music.
View details