Olivier Bachem

Olivier Bachem

Olivier is a research scientist in the Google Brain Team interested in fundamental problems in machine learning and artificial intelligence. He received his PhD from ETH Zurich where he was supervised by Andreas Krause in the Learning & Adaptive Systems group. In his dissertation, he investigated coresets - small summaries of large data sets with theoretical guarantees - and other sampling methods for large-scale clustering. He also held a Google PhD Fellowship in Machine Learning and was an Associated Fellow at the Max Planck ETH Center for Learning Systems. Before that, he obtained a bachelor’s degree in economics (University of St. Gallen), a master’s degree in quantitative finance (ETH Zurich & University of Zurich) as well as a master’s degree in statistics (ETH Zurich) where he was awarded an ETH medal for his master thesis.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
    Paul Roit
    Johan Ferret
    Geoffrey Cideron
    Matthieu Geist
    Sertan Girgin
    Léonard Hussenot
    Nikola Momchev
    Piotr Stanczyk
    Nino Vieillard
    Olivier Pietquin
    Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (2023), 6252–6272
    Preview abstract Despite the seeming success of contemporary grounded text generation systems, they often tend to generate factually inconsistent text with respect to their input. This phenomenon is emphasized in tasks like summarization, in which the generated summaries should be corroborated by their source article. In this work we leverage recent progress on textual entailment models to directly address this problem for abstractive summarization systems. We use reinforcement learning with reference-free, textual-entailment rewards to optimize for factual consistency and explore the ensuing trade-offs, as improved consistency may come at the cost of less informative or more extractive summaries. Our results, according to both automatic metrics and human evaluation, show that our method considerably improves the faithfulness, salience and conciseness of the generated summaries. View details
    A general class of surrogate functions for stable and efficient reinforcement learning
    Sharan Vaswani
    Simone Totaro
    Robert Müller
    Shivam Garg
    Matthieu Geist
    Marlos C. Machado
    Nicolas Le Roux
    AISTATS (2022)
    Preview abstract Common policy gradient methods rely on the maximization of a sequence of surrogate functions. In recent years, many such surrogate functions have been proposed, most without strong theoretical guarantees, leading to algorithms such as TRPO, PPO, or MPO. Rather than design yet another surrogate function, we instead propose a general framework (FMA-PG) based on functional mirror ascent that gives rise to an entire family of surrogate functions. We construct surrogate functions that enable policy improvement guarantees, a property not shared by most existing surrogate functions. Crucially, these guarantees hold regardless of the choice of policy parameterization. Moreover, a particular instantiation of FMA-PG recovers important implementation heuristics (e.g., using forward vs reverse KL divergence) resulting in a variant of TRPO with additional desirable properties. Via experiments on simple reinforcement learning problems, we evaluate the algorithms instantiated by FMA-PG. The proposed framework also suggests an improved variant of PPO, whose robustness and efficiency we empirically demonstrate on the MuJoCo suite. View details
    Offline Reinforcement Learning as Anti-Exploration
    Shideh Rezaeifar
    Nino Vieillard
    Léonard Hussenot
    Olivier Pietquin
    Matthieu Geist
    AAAI (2022)
    Preview abstract Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed dataset, without interactions with the system. An agent in this setting should avoid selecting actions whose consequences cannot be predicted from the data. This is the converse of exploration in RL, which favors such actions. We thus take inspiration from the literature on bonus-based exploration to design a new offline RL agent. The core idea is to subtract a prediction-based exploration bonus from the reward instead of adding it for exploration. This allows the policy to stay close to the support of the dataset. We connect this approach to a more usual regularization of the learnt policy towards the data. Instantiated with a bonus based on the prediction error of a variational autoencoder, we show that our agent is competitive with the state of the art on a set of continuous control locomotion and manipulation tasks. View details
    Preview abstract Neural retrieval models have superseded classic bag-of-words methods such as BM25 as the retrieval framework of choice. However, neural systems lack the interpretability of bag-of-words models; it is not trivial to connect a query change to a change in the latent space that ultimately determines the retrieval results. To shed light on this embedding space, we learn a "query decoder" that, given a latent representation of a neural search engine, generates the corresponding query. We show that it is possible to decode a meaningful query from its latent representation and, when moving in the right direction in latent space, to decode a query that retrieves the relevant paragraph. In particular, the query decoder can be useful to understand "what should have been asked" to retrieve a particular paragraph from the collection. We employ the query decoder to generate a large synthetic dataset of query reformulations for MSMarco, leading to improved retrieval performance. On this data, we train a pseudo-relevance feedback (PRF) T5 model for the application of query suggestion that outperforms both query reformulation and PRF information retrieval baselines. View details
    Concave Utility Reinforcement Learning: the Mean-field Game viewpoint
    Matthieu Geist
    Julien Perolat
    Mathieu Laurière
    Romuald Elie
    Sarah Perrin
    Remi Munos
    Olivier Pietquin
    AAMAS (2022)
    Preview abstract Concave Utility Reinforcement Learning (CURL) extends RL from linear to concave utilities in the occupancy measure induced by the agent's policy. This encompasses not only RL but also imitation learning and exploration, among others. Yet, this more general paradigm invalidates the classical Bellman equations, and calls for new algorithms. Mean-field Games (MFGs) are a continuous approximation of many-agent RL. They consider the limit case of a continuous distribution of identical agents, anonymous with symmetric interests, and reduce the problem to the study of a single representative agent in interaction with the full population. Our core contribution consists in showing that CURL is a subclass of MFGs. We think this important to bridge together both communities. It also allows to shed light on aspects of both fields: we show the equivalence between concavity in CURL and monotonicity in the associated MFG, between optimality conditions in CURL and Nash equilibrium in MFG, or that Fictitious Play (FP) for this class of MFGs is simply Frank-Wolfe, bringing the first convergence rate for discrete-time FP for MFGs. We also experimentally demonstrate that, using algorithms recently introduced for solving MFGs, we can address the CURL problem more efficiently. View details
    What Matters for Adversarial Imitation Learning?
    Manu Orsini
    Léonard Hussenot
    Damien Vincent
    Sertan Girgin
    Matthieu Geist
    Olivier Pietquin
    Marcin Andrychowicz
    NeurIPS (2021)
    Preview abstract Adversarial imitation learning has become a standard framework for imitation in continuous control. Over the years, several variations of its components were proposed to enhance the performance of the learned policies as well as the sample complexity of the algorithm. In practice, many of these choices are rarely tested all together in rigorous empirical studies. It is therefore difficult to discuss and understand what choices, among the high-level algorithmic options as well as low-level implementation details, matter. To tackle this issue, we implement more than 50 of these choices in a generic adversarial imitation learning framework and investigate their impacts in a large-scale study (>500k trained agents) with both synthetic and human-generated demonstrations. We analyze the key results and highlight the most surprising findings. View details
    What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study
    Marcin Andrychowicz
    Piotr Michal Stanczyk
    Manu Orsini
    Sertan Girgin
    Léonard Hussenot
    Matthieu Geist
    Olivier Pietquin
    Marcin Michalski
    Sylvain Gelly
    ICLR (2021)
    Preview abstract In recent years, reinforcement learning (RL) has been successfully applied to many different continuous control tasks. While RL algorithms are often conceptually simple, their state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents. Those choices are usually not extensively discussed in the literature, leading to discrepancy between published descriptions of algorithms and their implementations. This makes it hard to attribute progress in RL and slows down overall progress [Engstrom'20]. As a step towards filling that gap, we implement >50 such ``"choices" in a unified on-policy deep actor-critic framework, allowing us to investigate their impact in a large-scale empirical study. We train over 250'000 agents in five continuous control environments of different complexity and provide insights and practical recommendations for the training of on-policy deep actor-critic RL agents. View details
    Hyperparameter Selection for Imitation Learning
    Léonard Hussenot
    Marcin Andrychowicz
    Damien Vincent
    Lukasz Piotr Stafiniak
    Sertan Girgin
    Nikola M Momchev
    Manu Orsini
    Matthieu Geist
    Olivier Pietquin
    ICML (2021)
    Preview abstract We address the issue of tuning hyperparameters (HPs) for imitation learning algorithms when the underlying reward function of the demonstrating expert cannot be observed at any time. The vast literature in imitation learning mostly considers this reward function to be available for HP selection, although this is not a realistic setting. Indeed, would this reward function be available, it should then directly be used for policy training and imitation would not make sense. To tackle this mostly ignored problem, we propose and study, for different representative agents and benchmarks, a number of possible proxies to the return, within an extensive empirical study. We observe that, depending on the algorithm and the environment, some methods allow good performance to be achieved without using the unknown return. View details
    A Commentary on the Unsupervised Learning of Disentangled Representations
    Francesco Locatello
    Stefan Bauer
    Gunnar Rätsch
    Sylvain Gelly
    Bernhard Scholkopf
    AAAI Conference on Artificial Intelligence (2020)
    Preview abstract The goal of the unsupervised learning of disentangled representations is to separate the independent explanatory factors of variation in the data without access to supervision. In this paper, we summarize the results of (Locatello et al. 2019b) and focus on their implications for practitioners. We discuss the theoretical result showing that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases and the practical challenges it entails. Finally, we comment on our experimental findings, highlighting the limitations of state-of-the-art approaches and directions for future research. View details
    Evaluating Generative Models using Divergence Frontiers
    Josip Djolonga
    Marco Cuturi
    Sylvain Gelly
    International Conference on Artificial Intelligence and Statistics (2020)
    Preview abstract Despite the tremendous progress in the estimation of generative models, the development of tools for diagnosing their failures and assessing their performance has advanced at a much slower pace. Very recent developments have investigated metrics that quantify which parts of the true distribution is well modeled, and, on the contrary, what the model fails to capture, akin to precision and recall in information retrieval. In this paper we present a general evaluation framework for generative models that measures the trade-off between precision and recall using R\'enyi divergences. Our framework provides a novel perspective on existing techniques and extends them to more general domains. As a key advantage, it allows for efficient algorithms that are directly applicable to continuous distributions directly without discretization. We further showcase the proposed techniques on a set of image synthesis models. View details