Jump to Content
Alexis David Jacq

Alexis David Jacq

I did my PhD in Human-Robot Interactions at EPFL in Lausanne and at IST in Lisbon and my master in machine learning (MVA) at ENS in Cachan. Before joining as a research scientist, I was an intern at Google, working on multi-agent reinforcement learning: how agents can learn learn from each others and cooperate. My research interests cover all the aspects of reinforcement learning including exploration, imitation and multi-agent learning.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Lazy-MDPs: Towards Interpretable RL by Learning When to Act
    Johan Ferret
    Matthieu Geist
    Olivier Pietquin
    AAMAS 2022
    Preview abstract Traditionally, Reinforcement Learning (RL) aims at deciding how to act optimally for an artificial agent. We argue that deciding when to act is equally important. As humans, we drift from default, instinctive or memorized behaviors to focused, thought-out behaviors when required by the situation. To enhance RL agents with this aptitude, we propose to augment the standard Markov Decision Process and make a new mode of action available: the lazy mode, which defers decision-making to a given default policy. In addition, we penalize non-lazy actions in order to enforce minimal effort and have agents focus on critical decisions only. We name the resulting formalism lazy-MDPs. We study the theoretical properties of lazy-MDPs, expressing value functions and characterizing greediness and optimal solutions. Then we empirically demonstrate that policies learned in lazy-MDPs are generally more interpretable and highlight the states where it is important for the agent to act. When the default policy is uniformly random, we observe that agents are still able to approximate or even to surpass classic DQN agents on some Atari games while only taking control at a fewer subset of the states. View details
    Foolproof Cooperative Learning
    Julien Perolat
    Matthieu Geist
    Olivier Pietquin
    proceedings of ACML 2020
    Preview abstract This paper extends the notion of equilibrium in game theory to learning algorithms in repeated stochastic games. We define a learning equilibrium as an algorithm used by a population of players, such that no player can individually use an alternative algorithm and increase its asymptotic score. We introduce Foolproof Cooperative Learning (FCL), an algorithm that converges to a Tit-for-Tat behavior. It allows cooperative strategies when played against itself while being not exploitable by selfish players. We prove that in repeated symmetric games, this algorithm is a learning equilibrium. We illustrate the behavior of FCL on symmetric matrix and grid games, and its robustness to selfish learners. View details
    Learning from a Learner
    Matthieu Geist
    Ana Paiva
    Olivier Pietquin
    ICML (2019)
    Preview abstract In this paper, we propose a novel setting for Inverse Reinforcement Learning (IRL), namely "Learning from a Learner" (LfL). As opposed to standard IRL, it does not consist in learning a reward by observing an optimal agent but from observations of another learning (and thus sub-optimal) agent. To do so, we leverage the fact that the observed agent's policy is assumed to improve over time. The ultimate goal of this approach is to recover the actual environment's reward and to allow the observer to outperform the learner. To recover that reward in practice, we propose methods based on the entropy-regularized policy iteration framework. We discuss different approaches to learn solely from trajectories in the state-action space. We demonstrate the genericity of our method by observing agents implementing various reinforcement learning algorithms. Finally, we show that, on both discrete and continuous state/action tasks, the observer's performance (that optimizes the recovered reward) can surpass those of the observed agent. View details
    No Results Found