Edgar A. Duéñez Guzmán
I am a software engineer and a scientist. I develop algorithms and models for complex problem solving. Currently I work for DeepMind in the Multi-Agent Research team understanding how to get artificial learning agents to cooperate in social dilemmas.
My key expertise is in distributed systems, machine learning, natural computation, theory of computation, game theory, and evolutionary biology.
I have an undergraduate degree in mathematics, a masters in computer science and industrial mathematics and a doctorate in computer science. In my postdoctoral research I studied social evolution, swarm robotics and genetics.
Authored Publications
Sort By
Safe Policy Learning for Continuous Control
Ofir Nachum
Mohammad Ghavamzadeh
Conference on Robot Learning (CoRL) (2020)
Preview abstract
We study continuous action reinforcement learning problems in which it is crucial that the agent interacts with the environment only through near-safe policies, i.e.,~policies that keep the agent in desirable situations, both during training and at convergence. We formulate these problems as constrained Markov decision processes (CMDPs) and present safe policy optimization algorithms that are based on a Lyapunov approach to solve them. Our algorithms can use any standard policy gradient (PG) method, such as deep deterministic policy gradient (DDPG) or proximal policy optimization (PPO), to train a neural network policy, while enforcing near-constraint satisfaction for every policy update by projecting either the policy parameter or the selected action onto the set of feasible solutions induced by the state-dependent linearized Lyapunov constraints. Compared to the existing constrained PG algorithms, ours are more data efficient as they are able to utilize both on-policy and off-policy data. Moreover, in practice our action-projection algorithm often leads to less conservative policy updates and allows for natural integration into an end-to-end PG training pipeline. We evaluate our algorithms and compare them with the state-of-the-art baselines on several simulated (MuJoCo) tasks, as well as a real-world robot obstacle-avoidance problem, demonstrating their effectiveness in terms of balancing performance and constraint satisfaction.
View details
Evolving intrinsic motivations for altruistic behavior
Jane Wang
Edward Hughes
Chrisantha Fernando
Wojciech Czarnecki
Joel Z Leibo
AAMAS (2019)
Preview abstract
Multi-agent cooperation is an important feature of the natural world. Many tasks involve individual incentives that are misaligned with the common good, yet a wide range of organisms from insects to humans display a remarkable ability to overcome their differences and collaborate. Therefore, the emergence of cooperative behaviour amongst self-interested individuals is an important question for the fields of multi-agent reinforcement learning (MARL) and evolutionary theory. Here, we study a particular class of multi-agent problems called intertemporal social dilemmas, where the conflict between the individual and the group is particularly sharp. By combining MARL with appropriately structured natural selection, we demonstrate that individual inductive biases for cooperation can be learned in a model-free way, in contrast to previous work. To achieve this, we introduce an innovative modular architecture for deep reinforcement learning agents which supports multi-level selection. We present state-of-the-art results in two challenging environments, and interpret these in the context of cultural and ecological evolution.
View details
Preview abstract
In many real-world reinforcement learning (RL) problems, besides optimizing the main objective function, an agent must concurrently avoid violating a number of constraints. In particular, besides optimizing performance it is crucial to guarantee the safety of an agent during training as well as deployment (e.g. a robot should avoid taking actions - exploratory or not - which irrevocably harm its hardware). To incorporate safety in RL, we derive algorithms under the framework of constrained Markov decision problems (CMDPs), an extension of the standard Markov decision problems (MDPs) augmented with constraints on expected cumulative costs. Our approach hinges on a novel \emph{Lyapunov} method. We define and present a method for constructing Lyapunov functions, which provide an effective way to guarantee the global safety of a behavior policy during training via a set of local, linear constraints. Leveraging these theoretical underpinnings, we show how to use the Lyapunov approach to systematically transform dynamic programming (DP) and RL algorithms into their safe counterparts. To illustrate their effectiveness, we evaluate these algorithms in several CMDP planning and decision-making tasks on a safety benchmark domain. Our results show that our proposed method significantly outperforms existing baselines in balancing constraint satisfaction and performance.
View details
Inequity aversion improves credit assignment in intertemporal social dilemmas
Edward Hughes
Heather Roff
Iain Robert Dunning
Joel Z Leibo
Karl Tuyls
Raphael Koster
Thore Graepel
ICML (2018)
Inequity aversion improves cooperation in intertemporal social dilemmas
Edward Hughes
Joel Z Leibo
Matthew Phillips
Karl Paul Tuyls
Antonio García Castañeda
Iain Robert Dunning
Tina Zhu
Kevin Robert McKee
Raphael Koster
Heather Roff
Thore Graepel
NeurIPS (2018)
Preview abstract
Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However, this has not yet generated an agent that learns to cooperate in social dilemmas as humans do. A key insight is that many, but not all, human individuals have inequity averse social preferences. This promotes a particular resolution of the matrix game social dilemma wherein inequity-averse individuals are personally pro-social and punish defectors. Here we extend this idea to Markov games and show that it promotes cooperation in several types of sequential social dilemma, via a profitable interaction with policy learnability. In particular, we find that inequity aversion improves temporal credit assignment for the important class of intertemporal social dilemmas. These results help explain how large-scale cooperation may emerge and persist.
View details