Natasha Jaques
Natasha Jaques holds a joint position as a Research Scientist at Google Brain and post-doc at UC Berkeley. Her research focuses on social reinforcement learning---developing multi-agent RL algorithms that can improve single-agent learning, generalization, coordination, and human-AI collaboration. Natasha received her PhD from MIT, where she worked on Affective Computing and deep/reinforcement/machine learning. Her work has received the best demo award at NeurIPS 2016, best paper at the NeurIPS workshops on ML for Healthcare and Cooperative AI, and an honourable mention for best paper at ICML 2019. She has interned at DeepMind, Google Brain, and is an OpenAI Scholars mentor. Her work has been featured in Quartz, the MIT Technology Review, Boston Magazine, and on CBC radio. Natasha earned her Masters degree from the University of British Columbia, and undergraduate degrees in Computer Science and Psychology from the University of Regina.
See all publications at: https://scholar.google.com/citations?user=8iCb2TwAAAAJ
Authored Publications
Sort By
Less is More: Generating Grounded Navigation Instructions from Landmarks
Su Wang
Jordi Orbay
Vighnesh Birodkar
Izzeddin Gur
Peter Anderson
CVPR (2022) (to appear)
Preview abstract
We study the automatic generation of navigation instructions from 360-degree images captured on indoor routes. Existing generators suffer from poor visual grounding, causing them to rely on language priors and hallucinate objects. Our MARKY-MT5 system addresses this by focusing on visual landmarks; it comprises a first stage landmark detector and a second stage generator -- a multimodal, multilingual, multitask encoder-decoder. To train it, we bootstrap grounded landmark annotations on top of the Room-across-Room (RxR) dataset. Using text parsers, weak supervision from RxR's pose traces, and a multilingual image-text encoder trained on 1.8b images, we identify 1.1m English, Hindi and Telugu landmark descriptions and ground them to specific regions in panoramas. On Room-to-Room, human wayfinders obtain success rates (SR) of 71% following MARKY-MT5's instructions, just shy of their 75% SR following human instructions -- and well above SRs with other generators. Evaluations on RxR's longer, diverse paths obtain 61-64% SRs on three languages. Generating such high-quality navigation instructions in novel environments is a step towards conversational navigation tools and could facilitate larger-scale training of instruction-following agents.
View details
Emergent Social Learning via Multi-agent Reinforcement Learning
Kamal Ndousse
Sergey Levine
International Conference on Machine Learning (ICML) (2021)
Environment Generation for Zero-Shot Compositional Reinforcement Learning
Izzeddin Gur
Yingjie Miao
Jongwook Choi
Manoj Tiwari
Honglak Lee
Thirty-Fifth Conference on Neural Information Processing Systems (NeurIPS) (2021)
Preview abstract
Many real-world problems are compositional – solving them requires completing interdependent sub-tasks, either in series or in parallel, that can be represented as a dependency graph. Deep reinforcement learning (RL) agents often struggle to learn such complex tasks due to the long time horizons and sparse rewards. To address this problem, we present Compositional Design of Environments (CoDE), which trains a Generator agent to automatically build a series of compositional tasks tailored to the RL agent’s current skill level. This automatic curriculum not only enables the agent to learn more complex tasks than it could have otherwise, but also selects tasks where the agent’s performance is weak, enhancing its robustness and ability to generalize zero-shot to unseen tasks at test-time. We analyze why current environment generation techniques are insufficient for the problem of generating compositional tasks, and propose a new algorithm that addresses these issues. Our results assess learning and generalization across multiple compositional tasks, including the real-world problem of learning to navigate and interact with web pages. We learn to generate environments composed of multiple pages or rooms, and train RL agents capable of completing wide-range of complex tasks in those environments. We contribute two new benchmark frameworks for generating compositional tasks, compositional MiniGrid and gMiniWoB for web navigation.CoDE yields 4x higher success rate than the strongest baseline, and demonstrates strong performance of real websites learned on 3500 primitive tasks.
View details
Joint Attention for Multi-Agent Coordination and Social Learning
Dennis Lee
Jiaxing Wu
ICRA Workshop on Social Intelligence in Humans and Robots (2021)
Preview abstract
Joint attention — the ability to purposefully coordinate your attention with another person, and mutually attend to the same thing — is an important milestone in human cognitive development. In this paper, we ask whether joint attention can be useful as a mechanism for improving multi-agent coordination and social learning. We first develop deep reinforcement learning (RL) agents with a recurrent visual attention architecture. We then train agents to minimize the difference between the attention weights that they apply to the environment at each timestep, and the attention of other agents. Our results show that this joint attention incentive improves agents’ ability to solve difficult coordination tasks, by helping overcome the problem of exploring the combinatorial multi-agent action space. Joint attention leads to higher performance than a competitive centralized critic baseline across multiple environments. Further, we show that joint attention enhances agents’ ability to learn from experts present in their environment, even when performing single-agent tasks. Taken together, these findings suggest that joint attention may be a useful inductive bias for improving multi-agent learning.
View details
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning
Angelos Filos
Clare Lyle
Yarin Gal
Sergey Levine
Greg Farquhar
International Conference on Machine Learning (2021)
Preview abstract
We study a setting in which an agent has access to data from the interaction of other agents with the same environment. However, it has no access to the rewards or goals of these agents, and their objectives and levels of expertise may vary widely. These assumptions are common in multi-agent settings, such as driving. To effectively use this data, we turn to the framework of successor features. This allows us to disentangle shared features and dynamics of the environment from agent-specific rewards and policies. We propose a multi-task inverse reinforcement learning (IRL) algorithm, called inverse temporal difference learning (ITD), that learns shared state features, alongside per-agent successor features and preference vectors, purely from demonstrations without reward labels. We further show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called PsiPhi-learning (pronounced `Sci-Fi'). We provide empirical evidence for the effectiveness of PsiPhi-learning in a variety of environments, on RL, imitation, IRL, and few-shot transfer, and derive worst-case bounds for its performance in zero-shot transfer to new tasks.
View details
Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design
Michael Dennis*
Eugene Vinitsky
Alexandre Bayen
Stuart Russell
Andrew Critch
Sergey Levine
Neural Information Processing Systems (2020)
Preview abstract
A wide range of reinforcement learning (RL) problems - including robustness, transfer learning, unsupervised RL, and emergent complexity - require specifying a distribution of tasks or environments in which a policy will be trained. However, creating a useful distribution of environments is error prone, and takes a significant amount of developer time and effort. We propose Unsupervised Environment Design (UED) as an alternative paradigm, where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments. Existing approaches to automatically generating environments suffer from common failure modes: domain randomization cannot generate structure or adapt the difficulty of the environment to the agent's learning progress, and minimax adversarial training leads to worst-case environments that are often unsolvable. To generate structured, solvable environments for our protagonist agent, we introduce a second, antagonist agent that is allied with the environment-generating adversary. The adversary is motivated to generate environments which maximize regret, defined as the difference between the protagonist and antagonist agent's return. We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED). Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.
View details
Human-centric dialog training via offline reinforcement learning
Judy Hanwen Shen
Craig Ferguson
Agata Lapedriza
Noah Jones
Shixiang Gu
Rosalind Picard
Empirical Methods in Natural Language Processing (EMNLP) (2020)
Preview abstract
How can we train a dialog model to produce better conversations by learning from human feedback, without the risk of humans teaching it harmful chat behaviors? We start by hosting models online, and gather human feedback from real-time, open-ended conversations, which we then use to train and improve the models using offline reinforcement learning (RL). We identify implicit conversational cues including language similarity, elicitation of laughter, sentiment, and more, which indicate positive human feedback, and embed these in multiple reward functions. A well-known challenge is that learning an RL policy in an offline setting usually fails due to the lack of ability to explore and the tendency to make over-optimistic estimates of future reward. These problems become even harder when using RL for language models, which can easily have a 20,000 action vocabulary and many possible reward functions. We solve the challenge by developing a novel class of offline RL algorithms. These algorithms use KL-control to penalize divergence from a pre-trained prior language model, and use a new strategy to make the algorithm pessimistic, instead of optimistic, in the face of uncertainty. We test the resulting dialog model with ratings from 80 users in an open-domain setting and find it achieves significant improvements over existing deep offline RL approaches. The novel offline RL method is viable for improving any existing generative dialog model using a static dataset of human feedback.
View details
Learning via Social Awareness: Improving a Deep Generative Sketching Model with Facial Feedback
Jennifer McCleary
David Ha
Fred Bertsch
Rosalind Picard
International Joint Conference on Artificial Intelligence (IJCAI) 2018 (2020), pp. 1-9
Preview abstract
A known deficit of modern machine learning (ML) and deep learning (DL) methodology
is that models must be carefully fine-tuned in order to solve a particular task. Most
algorithms cannot generalize well to even highly similar tasks, let alone exhibit signs of
general artificial intelligence (AGI). To address this problem, researchers have explored
developing loss functions that act as intrinsic motivators that could motivate an ML or
DL agent to learn across a number of domains. This paper argues that an important
and useful intrinsic motivator is that of social interaction. We posit that making an AI
agent aware of implicit social feedback from humans can allow for faster learning of more
generalizable and useful representations, and could potentially impact AI safety. We collect
social feedback in the form of facial expression reactions to samples from Sketch RNN, an
LSTM-based variational autoencoder (VAE) designed to produce sketch drawings. We
use a Latent Constraints GAN (LC-GAN) to learn from the facial feedback of a small
group of viewers, by optimizing the model to produce sketches that it predicts will lead
to more positive facial expressions. We show in multiple independent evaluations that
the model trained with facial feedback produced sketches that are more highly rated, and
induce significantly more positive facial expressions. Thus, we establish that implicit social
feedback can improve the output of a deep learning model.
View details
Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning
Angeliki Lazaridou
Edward Hughes
Caglar Gulcehre
Pedro A. Ortega
DJ Strouse
Joel Z Leibo
Nando de Freitas
ICML 2019 (2019)
Preview abstract
We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions. Causal influence is assessed using counterfactual reasoning. At each timestep, an agent simulates alternate actions that it could have taken, and computes their effect on the behavior of other agents. Actions that lead to bigger changes in other agents' behavior are considered influential and are rewarded. We show that this is equivalent to rewarding agents for having high mutual information between their actions. Empirical results demonstrate that influence leads to enhanced coordination and communication in challenging social dilemma environments, dramatically increasing the learning curves of the deep RL agents, and leading to more meaningful learned communication protocols. The influence rewards for all agents can be computed in a decentralized way by enabling agents to learn a model of other agents using deep neural networks. In contrast, key previous works on emergent communication in the MARL setting were unable to learn diverse policies in a decentralized manner and had to resort to centralized training. Consequently, the influence reward opens up a window of new opportunities for research in this area.
View details
Learning via Social Awareness: Improving a Deep Generative Sketching Model with Facial Feedback
Jennifer McCleary
David Ha
Fred Bertsch
Rosalind Picard
ICLR 2018 Workshop
Preview abstract
In the quest towards general artificial intelligence (AI), researchers have explored developing loss functions that function as intrinsic motivators in the absence of external rewards. This paper takes the position that current research has overlooked an important and useful intrinsic motivator: social interaction. We posit that making an AI agent aware of implicit social feedback from humans can allow for more rapid learning of more generalizable and useful representations, and could potentially impact AI safety. We collect social feedback in the form of facial expression reactions to samples from Sketch RNN, an LSTM-based variational autoencoder designed to produce sketch drawings. We use a Latent Constraints GAN (LC-GAN) to learn from the facial feedback of a small group of viewers, and then show in an independent evaluation with 76 users that this model produced sketches that lead to significantly more smiling and less frowning than the baseline. Thus, we establish that implicit social feedback can improve the output of a deep learning model.
View details