Deep reinforcement learning has seen many remarkable successes over the past few years. However, progress is hindered by challenges faced in reinforcement learning, such as large variability in performance, catastrophic forgetting, and overfitting to particular states.
We propose Erdos-Selfridge-Spencer games as a reinforcement learning testbed. We focus in particular on one of the best-known games in this genre, Spencer’s attacker-defender game, also known as the “tenure game”. This game has several nice properties: it is (i) a low-dimensional, simply parametrized environment where (ii) there is a linear closed form solution for optimal behavior from any state, and (iii) the difficulty of the game can be tuned by changing environment parameters in an interpretable way. We compare several RL methods to the tenure game, examining their performance given varying environment difficulty and their generalization to environments outside the training set.