Raphaël Marinier
Research Areas
Authored Publications
Sort By
Solving N player dynamic routing games with congestion: a mean field approach (extended abstract)
Alexandre Bayen
Eric Goubault
Julien Perolat
Mathieu Lou-roch Lauriere
Olivier Pietquin
Romuald Elie
Sarah Perrin
Sertan Girgin
Theophile Cabannes
(2022)
Learning Equilibria in Mean-Field Games: Introducing Mean-Field PSRO
Paul Muller
Mark Rowland
Romuald Elie
Georgios Piliouras
Julien Perolat
Mathieu Lou-roch Lauriere
Olivier Pietquin
Karl Paul Tuyls
Proc. of AAMAS2022 (2022)
Preview abstract
Recent advances in multiagent learning have seen the introduction of a family of algorithms that revolve around the population-based training method PSRO, showing convergence to nash, correlated and coarse correlated equilibria. Notably, when the number of agents increases learning best-responses becomes exponentially more difficult, and as such hampers PSRO training methods. The field of Mean-Field games provides an asymptotic solution to this problem when the considered games are anonymous-symmetric. Unfortunately, the Mean-Field approximation introduces non-linearities which prevent a straightforward adaptation of PSRO. Building upon optimization and adversarial regret minimization, this paper sidesteps this issue and introduces Mean-Field PSRO, an adaptation of PSRO which learns nash, coarse correlated and correlated equilibria in Mean-Field Games. The key is to replace the exact distribution computation step by newly-defined Mean-Field no-adversarial-regret learners, or by black-box optimization. We compare the asymptotic complexity of the approach to standard PSRO, and greatly improve empirical bandit convergence speed by compressing temporal mixture weights, and ensure it is theoretically robust to payoff noise. Finally, we illustrate the speed and accuracy of Mean-Field PSRO on several Mean-Field games, demonstrating convergence to strong and weak equilibria.
View details
What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study
Marcin Andrychowicz
Piotr Michal Stanczyk
Manu Orsini
Sertan Girgin
Léonard Hussenot
Matthieu Geist
Olivier Pietquin
Marcin Michalski
Sylvain Gelly
ICLR (2021)
Preview abstract
In recent years, reinforcement learning (RL) has been successfully applied to many different continuous control tasks. While RL algorithms are often conceptually simple, their state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents. Those choices are usually not extensively discussed in the literature, leading to discrepancy between published descriptions of algorithms and their implementations. This makes it hard to attribute progress in RL and slows down overall progress [Engstrom'20]. As a step towards filling that gap, we implement >50 such ``"choices" in a unified on-policy deep actor-critic framework, allowing us to investigate their impact in a large-scale empirical study. We train over 250'000 agents in five continuous control environments of different complexity and provide insights and practical recommendations for the training of on-policy deep actor-critic RL agents.
View details
Hyperparameter Selection for Imitation Learning
Léonard Hussenot
Marcin Andrychowicz
Damien Vincent
Lukasz Piotr Stafiniak
Sertan Girgin
Nikola M Momchev
Manu Orsini
Matthieu Geist
Olivier Pietquin
ICML (2021)
Preview abstract
We address the issue of tuning hyperparameters (HPs) for imitation learning algorithms when the underlying reward function of the demonstrating expert cannot be observed at any time. The vast literature in imitation learning mostly considers this reward function to be available for HP selection, although this is not a realistic setting. Indeed, would this reward function be available, it should then directly be used for policy training and imitation would not make sense. To tackle this mostly ignored problem, we propose and study, for different representative agents and benchmarks, a number of possible proxies to the return, within an extensive empirical study. We observe that, depending on the algorithm and the environment, some methods allow good performance to be achieved without using the unknown return.
View details
SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference
Piotr Michal Stanczyk
Marcin Michalski
International Conference on Learning Representations (2020) (to appear)
Preview abstract
We present a modern scalable reinforcement learning agent called SEED (Scalable, Efficient Deep-RL). By effectively utilizing modern accelerators, we show that it is not only possible to train on millions of frames per second but also to lower the cost of experiments compared to current methods. We achieve this with a simple architecture that features centralized inference and an optimized communication layer. SEED adopts two state of the art distributed algorithms, IMPALA/V-trace (policy gradients) and R2D2 (Q-learning), and is evaluated on Atari-57, DeepMind Lab and Google Research Football. We improve the state of the art on Football and are able to reach state of the art on Atari-57 three times faster in wall-time. For the scenarios we consider, a 40% to 80% cost reduction for running experiments is achieved. The implementation along with experiments is open-sourced so results can be reproduced and novel ideas tried out.
View details
Preview abstract
The ability to transfer representations to novel environments and tasks is a sensible desiderata for general learning agents. Despite the apparent promises, transfer in Reinforcement Learning is still an open and under-exploited research area. In this paper, we suggest that credit assignment, regarded as a supervised learning task, could be used to accomplish transfer. Our contribution is twofold, we introduce a new credit assignment mechanism based on self-attention, and show that the learned credit can be transferred to in-domain and out-of-domain scenarios.
View details
Episodic Curiosity through Reachability
Nikolay Savinov
Damien Vincent
Marc Pollefeys
Timothy Lillicrap
Sylvain Gelly
ICLR (2019)
Preview abstract
Rewards are sparse in the real world and most today’s reinforcement learning algorithms struggle with such sparsity. One solution to this problem is to allow the
agent to create rewards for itself — thus making rewards dense and more suitable
for learning. In particular, inspired by curious behaviour in animals, observing
something novel could be rewarded with a bonus. Such bonus is summed up with
the real task reward — making it possible for RL algorithms to learn from the
combined reward. We propose a new curiosity method which uses episodic memory to form the novelty bonus. To determine the bonus, the current observation
is compared with the observations in memory. Crucially, the comparison is done
based on how many environment steps it takes to reach the current observation
from those in memory — which incorporates rich information about environment
dynamics. This allows us to overcome the known “couch-potato” issues of prior
work — when the agent finds a way to instantly gratify itself by exploiting actions
which lead to hardly predictable consequences. We test our approach in visually
rich 3D environments in VizDoom, DMLab and MuJoCo. In navigational tasks
from VizDoom and DMLab, our agent outperforms the state-of-the-art curiosity
method ICM. In MuJoCo, an ant equipped with our curiosity module learns locomotion out of the first-person-view curiosity only. The code is available at
https://github.com/google-research/episodic-curiosity.
View details
Towards Accurate Generative Models of Video: A New Metric & Challenges
Thomas Unterthiner
Karol Kurach
Marcin Michalski
Sylvain Gelly
arXiv (2018)
Preview abstract
Recent advances in deep generative models have lead to remarkable progress in synthesizing high quality images. Following their successful application in image processing and representation learning, an important next step is to consider videos. Learning generative models of video is a much harder task, requiring a model to capture the temporal dynamics of a scene, in addition to the visual presentation of objects. While recent attempts at formulating generative models of video have had some success, current progress is hampered by (1) the lack of qualitative metrics that consider visual quality, temporal coherence, and diversity of samples, and (2) the wide gap between purely synthetic video data sets and challenging real-world data sets in terms of complexity. To this extent we propose Fréchet Video Distance (FVD), a new metric for generative models of video, and StarCraft 2 Videos (SCV), a benchmark of game play from custom starcraft 2 scenarios that challenge the current capabilities of generative models of video. We contribute a large-scale human study, which confirms that FVD correlates well with qualitative human judgment of generated videos, and provide initial benchmark results on SCV.
View details