Inequity aversion improves credit assignment in intertemporal social dilemmas

Edward Hughes
Heather Roff
Iain Robert Dunning
Joel Z Leibo
Karl Tuyls
Raphael Koster
Thore Graepel


Multi-agent learning in social dilemmas has largely focused on cooperative behavior in stateless matrix games. Recent work has shown how these settings can be spatially and temporally extended to sequential social dilemmas (SSDs), a richer representation that better captures real world dynamics. Research from behavioral economics and evolutionary game theory indicates that most humans have preferences for social goals like fairness and reciprocity. Models based on these ideas have been successfully applied to predict and explain human behavior in a variety of laboratory settings. This paper contributes a new way of modeling agents with inequity-averse social preferences. By integrating methods from multi-agent deep reinforcement learning with models from behavioral economics, we can study ecologically plausible scenarios at scale. In particular, we consider multi-agent social dilemmas where short-term individual incentives clash with long-term collective interest. In these cases there is a significant temporal lag between the actions of free riders and their negative consequences for the group. We show that inequity aversion improves temporal credit assignment in these cases thus making large-scale long-term cooperation more likely to emerge and persist.