Pareto-Optimal Algorithms for Learning in Games

Eshwar Ram Arunachaleswaran
Natalie Collina
2024

Abstract

We study the problem of characterizing optimal learning algorithms for playing repeated games. While it is well understood how to play a repeated game when the other player's payoff is known, it is not clear what constitutes an optimal algorithm to commit to against an unknown but rational player with full information (called an optimizer). One approach is to employ a no-regret algorithm, which provide some counterfactual guarantees for the learner, though they are not necessarily optimal against many optimizers. In this paper, we introduce the natural notion of \emph{pareto optimal learning algorithms}, algorithms which are not pareto-dominated over the space of possible optimizer payoffs. Intuitively, if a learning algorithm is pareto-optimal, then there is no other algorithm which performs at least as well against all optimizer payoffs and performs strictly better against some optimizer payoff.

We investigate the interplay between the no-regret guarantee and the pareto-optimality guarantee. We show that these properties are incomparable, and furthermore that well known no-regret algorithms such as multiplicative weights and follow the regularized leader are pareto-dominated. However, while no-reget is not enough to ensure pareto-optimality, we show that a strictly stronger property, no-swap-regret, is a sufficient condition for pareto-optimality.

Proving these results require us to address various technical challenges specific to repeated play, including the fact that there is no simple characterization of how optimizers who are rational in the long-term best-respond against a learning algorithm over multiple rounds of play. Consequently, our work employs a geometric understanding of all effective transcripts generated by a learning algorithm, which we refer to as a menu. We prove interesting technical results about the asymptotic properties of these menus. Notably, we show that up to sublinear factors, all algorithms with the no-swap-regret property are operationally identical.
×