Learning Equilibria in Mean-Field Games: Introducing Mean-Field PSRO
Abstract
Recent advances in multiagent learning have seen the introduction of a family of algorithms that revolve around the population-based training method PSRO, showing convergence to nash, correlated and coarse correlated equilibria. Notably, when the number of agents increases learning best-responses becomes exponentially more difficult, and as such hampers PSRO training methods. The field of Mean-Field games provides an asymptotic solution to this problem when the considered games are anonymous-symmetric. Unfortunately, the Mean-Field approximation introduces non-linearities which prevent a straightforward adaptation of PSRO. Building upon optimization and adversarial regret minimization, this paper sidesteps this issue and introduces Mean-Field PSRO, an adaptation of PSRO which learns nash, coarse correlated and correlated equilibria in Mean-Field Games. The key is to replace the exact distribution computation step by newly-defined Mean-Field no-adversarial-regret learners, or by black-box optimization. We compare the asymptotic complexity of the approach to standard PSRO, and greatly improve empirical bandit convergence speed by compressing temporal mixture weights, and ensure it is theoretically robust to payoff noise. Finally, we illustrate the speed and accuracy of Mean-Field PSRO on several Mean-Field games, demonstrating convergence to strong and weak equilibria.