Jump to Content
Edgar A. Duéñez Guzmán

Edgar A. Duéñez Guzmán

I am a software engineer and a scientist. I develop algorithms and models for complex problem solving. Currently I work for DeepMind in the Multi-Agent Research team understanding how to get artificial learning agents to cooperate in social dilemmas. My key expertise is in distributed systems, machine learning, natural computation, theory of computation, game theory, and evolutionary biology. I have an undergraduate degree in mathematics, a masters in computer science and industrial mathematics and a doctorate in computer science. In my postdoctoral research I studied social evolution, swarm robotics and genetics.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Safe Policy Learning for Continuous Control
    Ofir Nachum
    Mohammad Ghavamzadeh
    Conference on Robot Learning (CoRL) (2020)
    Preview abstract We study continuous action reinforcement learning problems in which it is crucial that the agent interacts with the environment only through near-safe policies, i.e.,~policies that keep the agent in desirable situations, both during training and at convergence. We formulate these problems as constrained Markov decision processes (CMDPs) and present safe policy optimization algorithms that are based on a Lyapunov approach to solve them. Our algorithms can use any standard policy gradient (PG) method, such as deep deterministic policy gradient (DDPG) or proximal policy optimization (PPO), to train a neural network policy, while enforcing near-constraint satisfaction for every policy update by projecting either the policy parameter or the selected action onto the set of feasible solutions induced by the state-dependent linearized Lyapunov constraints. Compared to the existing constrained PG algorithms, ours are more data efficient as they are able to utilize both on-policy and off-policy data. Moreover, in practice our action-projection algorithm often leads to less conservative policy updates and allows for natural integration into an end-to-end PG training pipeline. We evaluate our algorithms and compare them with the state-of-the-art baselines on several simulated (MuJoCo) tasks, as well as a real-world robot obstacle-avoidance problem, demonstrating their effectiveness in terms of balancing performance and constraint satisfaction. View details
    Evolving intrinsic motivations for altruistic behavior
    Jane Wang
    Edward Hughes
    Chrisantha Fernando
    Wojciech Czarnecki
    Joel Z Leibo
    AAMAS (2019)
    Preview abstract Multi-agent cooperation is an important feature of the natural world. Many tasks involve individual incentives that are misaligned with the common good, yet a wide range of organisms from insects to humans display a remarkable ability to overcome their differences and collaborate. Therefore, the emergence of cooperative behaviour amongst self-interested individuals is an important question for the fields of multi-agent reinforcement learning (MARL) and evolutionary theory. Here, we study a particular class of multi-agent problems called intertemporal social dilemmas, where the conflict between the individual and the group is particularly sharp. By combining MARL with appropriately structured natural selection, we demonstrate that individual inductive biases for cooperation can be learned in a model-free way, in contrast to previous work. To achieve this, we introduce an innovative modular architecture for deep reinforcement learning agents which supports multi-level selection. We present state-of-the-art results in two challenging environments, and interpret these in the context of cultural and ecological evolution. View details
    Inequity aversion improves cooperation in intertemporal social dilemmas
    Edward Hughes
    Joel Z Leibo
    Matthew Phillips
    Karl Paul Tuyls
    Antonio García Castañeda
    Iain Robert Dunning
    Tina Zhu
    Kevin Robert McKee
    Raphael Koster
    Heather Roff
    Thore Graepel
    NeurIPS (2018)
    Preview abstract Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However, this has not yet generated an agent that learns to cooperate in social dilemmas as humans do. A key insight is that many, but not all, human individuals have inequity averse social preferences. This promotes a particular resolution of the matrix game social dilemma wherein inequity-averse individuals are personally pro-social and punish defectors. Here we extend this idea to Markov games and show that it promotes cooperation in several types of sequential social dilemma, via a profitable interaction with policy learnability. In particular, we find that inequity aversion improves temporal credit assignment for the important class of intertemporal social dilemmas. These results help explain how large-scale cooperation may emerge and persist. View details
    Inequity aversion improves credit assignment in intertemporal social dilemmas
    Edward Hughes
    Heather Roff
    Iain Robert Dunning
    Joel Z Leibo
    Karl Tuyls
    Raphael Koster
    Thore Graepel
    ICML (2018)
    Preview abstract Multi-agent learning in social dilemmas has largely focused on cooperative behavior in stateless matrix games. Recent work has shown how these settings can be spatially and temporally extended to sequential social dilemmas (SSDs), a richer representation that better captures real world dynamics. Research from behavioral economics and evolutionary game theory indicates that most humans have preferences for social goals like fairness and reciprocity. Models based on these ideas have been successfully applied to predict and explain human behavior in a variety of laboratory settings. This paper contributes a new way of modeling agents with inequity-averse social preferences. By integrating methods from multi-agent deep reinforcement learning with models from behavioral economics, we can study ecologically plausible scenarios at scale. In particular, we consider multi-agent social dilemmas where short-term individual incentives clash with long-term collective interest. In these cases there is a significant temporal lag between the actions of free riders and their negative consequences for the group. We show that inequity aversion improves temporal credit assignment in these cases thus making large-scale long-term cooperation more likely to emerge and persist. View details
    Preview abstract In many real-world reinforcement learning (RL) problems, besides optimizing the main objective function, an agent must concurrently avoid violating a number of constraints. In particular, besides optimizing performance it is crucial to guarantee the safety of an agent during training as well as deployment (e.g. a robot should avoid taking actions - exploratory or not - which irrevocably harm its hardware). To incorporate safety in RL, we derive algorithms under the framework of constrained Markov decision problems (CMDPs), an extension of the standard Markov decision problems (MDPs) augmented with constraints on expected cumulative costs. Our approach hinges on a novel \emph{Lyapunov} method. We define and present a method for constructing Lyapunov functions, which provide an effective way to guarantee the global safety of a behavior policy during training via a set of local, linear constraints. Leveraging these theoretical underpinnings, we show how to use the Lyapunov approach to systematically transform dynamic programming (DP) and RL algorithms into their safe counterparts. To illustrate their effectiveness, we evaluate these algorithms in several CMDP planning and decision-making tasks on a safety benchmark domain. Our results show that our proposed method significantly outperforms existing baselines in balancing constraint satisfaction and performance. View details
    No Results Found