Jump to Content
Olivier Bachem

Olivier Bachem

Olivier is a research scientist in the Google Brain Team interested in fundamental problems in machine learning and artificial intelligence. He received his PhD from ETH Zurich where he was supervised by Andreas Krause in the Learning & Adaptive Systems group. In his dissertation, he investigated coresets - small summaries of large data sets with theoretical guarantees - and other sampling methods for large-scale clustering. He also held a Google PhD Fellowship in Machine Learning and was an Associated Fellow at the Max Planck ETH Center for Learning Systems. Before that, he obtained a bachelor’s degree in economics (University of St. Gallen), a master’s degree in quantitative finance (ETH Zurich & University of Zurich) as well as a master’s degree in statistics (ETH Zurich) where he was awarded an ETH medal for his master thesis.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
    Paul Roit
    Johan Ferret
    Geoffrey Cideron
    Matthieu Geist
    Sertan Girgin
    Léonard Hussenot
    Nikola Momchev
    Piotr Stanczyk
    Nino Vieillard
    Olivier Pietquin
    Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (2023), 6252–6272
    Preview abstract Despite the seeming success of contemporary grounded text generation systems, they often tend to generate factually inconsistent text with respect to their input. This phenomenon is emphasized in tasks like summarization, in which the generated summaries should be corroborated by their source article. In this work we leverage recent progress on textual entailment models to directly address this problem for abstractive summarization systems. We use reinforcement learning with reference-free, textual-entailment rewards to optimize for factual consistency and explore the ensuing trade-offs, as improved consistency may come at the cost of less informative or more extractive summaries. Our results, according to both automatic metrics and human evaluation, show that our method considerably improves the faithfulness, salience and conciseness of the generated summaries. View details
    Concave Utility Reinforcement Learning: the Mean-field Game viewpoint
    Matthieu Geist
    Julien Perolat
    Mathieu Laurière
    Romuald Elie
    Sarah Perrin
    Remi Munos
    Olivier Pietquin
    AAMAS (2022)
    Preview abstract Concave Utility Reinforcement Learning (CURL) extends RL from linear to concave utilities in the occupancy measure induced by the agent's policy. This encompasses not only RL but also imitation learning and exploration, among others. Yet, this more general paradigm invalidates the classical Bellman equations, and calls for new algorithms. Mean-field Games (MFGs) are a continuous approximation of many-agent RL. They consider the limit case of a continuous distribution of identical agents, anonymous with symmetric interests, and reduce the problem to the study of a single representative agent in interaction with the full population. Our core contribution consists in showing that CURL is a subclass of MFGs. We think this important to bridge together both communities. It also allows to shed light on aspects of both fields: we show the equivalence between concavity in CURL and monotonicity in the associated MFG, between optimality conditions in CURL and Nash equilibrium in MFG, or that Fictitious Play (FP) for this class of MFGs is simply Frank-Wolfe, bringing the first convergence rate for discrete-time FP for MFGs. We also experimentally demonstrate that, using algorithms recently introduced for solving MFGs, we can address the CURL problem more efficiently. View details
    A general class of surrogate functions for stable and efficient reinforcement learning
    Sharan Vaswani
    Simone Totaro
    Robert Müller
    Shivam Garg
    Matthieu Geist
    Marlos C. Machado
    Nicolas Le Roux
    AISTATS (2022)
    Preview abstract Common policy gradient methods rely on the maximization of a sequence of surrogate functions. In recent years, many such surrogate functions have been proposed, most without strong theoretical guarantees, leading to algorithms such as TRPO, PPO, or MPO. Rather than design yet another surrogate function, we instead propose a general framework (FMA-PG) based on functional mirror ascent that gives rise to an entire family of surrogate functions. We construct surrogate functions that enable policy improvement guarantees, a property not shared by most existing surrogate functions. Crucially, these guarantees hold regardless of the choice of policy parameterization. Moreover, a particular instantiation of FMA-PG recovers important implementation heuristics (e.g., using forward vs reverse KL divergence) resulting in a variant of TRPO with additional desirable properties. Via experiments on simple reinforcement learning problems, we evaluate the algorithms instantiated by FMA-PG. The proposed framework also suggests an improved variant of PPO, whose robustness and efficiency we empirically demonstrate on the MuJoCo suite. View details
    Offline Reinforcement Learning as Anti-Exploration
    Shideh Rezaeifar
    Nino Vieillard
    Léonard Hussenot
    Olivier Pietquin
    Matthieu Geist
    AAAI (2022)
    Preview abstract Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed dataset, without interactions with the system. An agent in this setting should avoid selecting actions whose consequences cannot be predicted from the data. This is the converse of exploration in RL, which favors such actions. We thus take inspiration from the literature on bonus-based exploration to design a new offline RL agent. The core idea is to subtract a prediction-based exploration bonus from the reward instead of adding it for exploration. This allows the policy to stay close to the support of the dataset. We connect this approach to a more usual regularization of the learnt policy towards the data. Instantiated with a bonus based on the prediction error of a variational autoencoder, we show that our agent is competitive with the state of the art on a set of continuous control locomotion and manipulation tasks. View details
    Preview abstract Neural retrieval models have superseded classic bag-of-words methods such as BM25 as the retrieval framework of choice. However, neural systems lack the interpretability of bag-of-words models; it is not trivial to connect a query change to a change in the latent space that ultimately determines the retrieval results. To shed light on this embedding space, we learn a "query decoder" that, given a latent representation of a neural search engine, generates the corresponding query. We show that it is possible to decode a meaningful query from its latent representation and, when moving in the right direction in latent space, to decode a query that retrieves the relevant paragraph. In particular, the query decoder can be useful to understand "what should have been asked" to retrieve a particular paragraph from the collection. We employ the query decoder to generate a large synthetic dataset of query reformulations for MSMarco, leading to improved retrieval performance. On this data, we train a pseudo-relevance feedback (PRF) T5 model for the application of query suggestion that outperforms both query reformulation and PRF information retrieval baselines. View details
    What Matters for Adversarial Imitation Learning?
    Manu Orsini
    Léonard Hussenot
    Damien Vincent
    Sertan Girgin
    Matthieu Geist
    Olivier Pietquin
    Marcin Andrychowicz
    NeurIPS (2021)
    Preview abstract Adversarial imitation learning has become a standard framework for imitation in continuous control. Over the years, several variations of its components were proposed to enhance the performance of the learned policies as well as the sample complexity of the algorithm. In practice, many of these choices are rarely tested all together in rigorous empirical studies. It is therefore difficult to discuss and understand what choices, among the high-level algorithmic options as well as low-level implementation details, matter. To tackle this issue, we implement more than 50 of these choices in a generic adversarial imitation learning framework and investigate their impacts in a large-scale study (>500k trained agents) with both synthetic and human-generated demonstrations. We analyze the key results and highlight the most surprising findings. View details
    Hyperparameter Selection for Imitation Learning
    Léonard Hussenot
    Marcin Andrychowicz
    Damien Vincent
    Lukasz Piotr Stafiniak
    Sertan Girgin
    Nikola M Momchev
    Manu Orsini
    Matthieu Geist
    Olivier Pietquin
    ICML (2021)
    Preview abstract We address the issue of tuning hyperparameters (HPs) for imitation learning algorithms when the underlying reward function of the demonstrating expert cannot be observed at any time. The vast literature in imitation learning mostly considers this reward function to be available for HP selection, although this is not a realistic setting. Indeed, would this reward function be available, it should then directly be used for policy training and imitation would not make sense. To tackle this mostly ignored problem, we propose and study, for different representative agents and benchmarks, a number of possible proxies to the return, within an extensive empirical study. We observe that, depending on the algorithm and the environment, some methods allow good performance to be achieved without using the unknown return. View details
    What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study
    Marcin Andrychowicz
    Piotr Michal Stanczyk
    Manu Orsini
    Sertan Girgin
    Léonard Hussenot
    Matthieu Geist
    Olivier Pietquin
    Marcin Michalski
    Sylvain Gelly
    ICLR (2021)
    Preview abstract In recent years, reinforcement learning (RL) has been successfully applied to many different continuous control tasks. While RL algorithms are often conceptually simple, their state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents. Those choices are usually not extensively discussed in the literature, leading to discrepancy between published descriptions of algorithms and their implementations. This makes it hard to attribute progress in RL and slows down overall progress [Engstrom'20]. As a step towards filling that gap, we implement >50 such ``"choices" in a unified on-policy deep actor-critic framework, allowing us to investigate their impact in a large-scale empirical study. We train over 250'000 agents in five continuous control environments of different complexity and provide insights and practical recommendations for the training of on-policy deep actor-critic RL agents. View details
    Evaluating Generative Models using Divergence Frontiers
    Josip Djolonga
    Marco Cuturi
    Sylvain Gelly
    International Conference on Artificial Intelligence and Statistics (2020)
    Preview abstract Despite the tremendous progress in the estimation of generative models, the development of tools for diagnosing their failures and assessing their performance has advanced at a much slower pace. Very recent developments have investigated metrics that quantify which parts of the true distribution is well modeled, and, on the contrary, what the model fails to capture, akin to precision and recall in information retrieval. In this paper we present a general evaluation framework for generative models that measures the trade-off between precision and recall using R\'enyi divergences. Our framework provides a novel perspective on existing techniques and extends them to more general domains. As a key advantage, it allows for efficient algorithms that are directly applicable to continuous distributions directly without discretization. We further showcase the proposed techniques on a set of image synthesis models. View details
    A Commentary on the Unsupervised Learning of Disentangled Representations
    Francesco Locatello
    Stefan Bauer
    Gunnar Rätsch
    Sylvain Gelly
    Bernhard Scholkopf
    AAAI Conference on Artificial Intelligence (2020)
    Preview abstract The goal of the unsupervised learning of disentangled representations is to separate the independent explanatory factors of variation in the data without access to supervision. In this paper, we summarize the results of (Locatello et al. 2019b) and focus on their implications for practitioners. We discuss the theoretical result showing that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases and the practical challenges it entails. Finally, we comment on our experimental findings, highlighting the limitations of state-of-the-art approaches and directions for future research. View details
    Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations
    Francesco Locatello
    Stefan Bauer
    Gunnar Rätsch
    Sylvain Gelly
    Bernhard Schölkopf
    International Conference on Machine Learning (2019)
    Preview abstract In recent years, the interest in unsupervised learning of disentangled representations has significantly increased. The key assumption is that real-world data is generated by a few explanatory factors of variation and that these factors can be recovered by unsupervised learning algorithms. A large number of unsupervised learning approaches based on auto-encoding and quantitative evaluation metrics of disentanglement have been proposed; yet, the efficacy of the proposed approaches and utility of proposed notions of disentanglement has not been challenged in prior work. In this paper, we provide a sober look on recent progress in the field and challenge some common assumptions. We first theoretically show that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data. Then, we train more than 12000 models covering the six most prominent methods, and evaluate them across six disentanglement metrics in a reproducible large-scale experimental study on seven different data sets. On the positive side, we observe that different methods successfully enforce properties "encouraged" by the corresponding losses. On the negative side, we observe in our study that well-disentangled models seemingly cannot be identified without access to ground-truth labels even if we are allowed to transfer hyperparameters across data sets. Furthermore, increased disentanglement does not seem to lead to a decreased sample complexity of learning for downstream tasks. These results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision, investigate concrete benefits of enforcing disentanglement of the learned representations, and consider a reproducible experimental setup covering several data sets. View details
    Are Disentangled Representations Helpful for Abstract Visual Reasoning?
    Francesco Locatello
    Jürgen Schmidhuber
    Advances in Neural Information Processing Systems (2019), pp. 14245-14258
    Preview abstract A disentangled representation encodes information about the salient factors of variation in the data independently. Although it is often argued that this representational format is useful in learning to solve many real-world down-stream tasks, there is little empirical evidence that supports this claim. In this paper, we conduct a large-scale study that investigates whether disentangled representations are more suitable for abstract reasoning tasks. Using two new tasks similar to Raven’s Progressive Matrices, we evaluate the usefulness of the representations learned by 360 state-of-the-art unsupervised disentanglement models. Based on these representations, we train 3600 abstract reasoning models and observe that disentangled representations do in fact lead to better down-stream performance. In particular, they enable quicker learning using fewer samples. View details
    High-Fidelity Image Generation With Fewer Labels
    Michael Tschannen
    Sylvain Gelly
    International Conference on Machine Learning (2019)
    Preview abstract Deep generative models are becoming a cornerstone of modern machine learning. Recent work on conditional generative adversarial net-works has shown that learning complex, high-dimensional distributions over natural images is within reach. While the latest models are able to generate high-fidelity, diverse natural images at high resolution, they rely on a vast quantity of labeled data. In this work we demonstrate how one can benefit from recent work on self- and semi-supervised learning to outperform state-of-the-art on both unsupervised ImageNet synthesis,as well as in the conditional setting. In particular, the proposed approach is able to match the sample quality (as measured by FID) of the current state-of-the art conditional model BigGAN on ImageNet using only 10% of the labels and outperform it using 20% of the labels. View details
    Google Research Football: A Novel Reinforcement Learning Environment
    Karol Kurach
    Piotr Michal Stanczyk
    Michał Zając
    Carlos Riquelme
    Damien Vincent
    Marcin Michalski
    Sylvain Gelly
    AAAI (2019)
    Preview abstract Recent progress in the field of reinforcement learning has been accelerated by virtual learning environments such as video games, where novel algorithms and ideas can be quickly tested in a safe and reproducible manner. We introduce the Google Research Football Environment, a new reinforcement learning environment where agents are trained to play football in an advanced, physics-based 3D simulator. The resulting environment is challenging, easy to use and customize, and it is available under a permissive open-source license. We further propose three full-game scenarios of varying difficulty with the Football Benchmarks, we report baseline results for three commonly used reinforcement algorithms (Impala, PPO, and Ape-X DQN), and we also provide a diverse set of simpler scenarios with the Football Academy. View details
    Preview abstract Recent advances in generative modeling have led to an increased interest in the study of statistical divergences as means of model comparison. Commonly used evaluation methods, such as Fr\'echet Inception Distance (FID), correlate well with the perceived quality of samples and are sensitive to mode dropping. However, these metrics are unable to distinguish between different failure cases since they yield one-dimensional scores. We propose a novel definition of precision and recall for distributions which disentangles the divergence into two separate dimensions. The proposed notion is intuitive, retains desirable properties, and naturally leads to an efficient algorithm that can be used to evaluate generative models. We relate this notion to total variation as well as to recent evaluation metrics such as Inception Score and FID. To demonstrate the practical utility of the proposed approach we perform an empirical study on several variants of Generative Adversarial Networks and the Variational Autoencoder. In an extensive set of experiments we show that the proposed metric is able to disentangle the quality of generated samples from the coverage of the target distribution. View details
    Preview abstract Learning useful representations with little or no supervision is a key challenge in artificial intelligence. We provide an in-depth review of recent advances in representation learning with a focus on autoencoder-based models. To organize these results we make use of meta-priors believed useful for downstream tasks, such as disentanglement and hierarchical organization of features. In particular, we uncover three main mechanisms to enforce such properties, namely (i) regularizing the (approximate or aggregate) posterior distribution, (ii) factorizing the encoding and decoding distribution, or (iii) introducing a structured prior distribution. While there are some promising results, implicit or explicit supervision remains a key enabler and all current methods use strong inductive biases and modeling assumptions. Finally, we provide an analysis of autoencoder-based representation learning through the lens of rate-distortion theory and identify a clear tradeoff between the amount of prior knowledge available about the downstream tasks, and how useful the representation is for this task. View details
    One-Shot Coresets: The Case of k-Clustering
    International Conference on Artificial Intelligence and Statistics (2018)
    Preview abstract Scaling clustering algorithms to massive data sets is a challenging task. Recently, several successful approaches based on data summarization methods, such as coresets and sketches, were proposed. While these techniques provide provably good and small summaries, they are inherently problem dependent --- the practitioner has to commit to a fixed clustering objective before even exploring the data. However, can one construct small data summaries for a wide range of clustering problems simultaneously? In this work, we affirmatively answer this question by proposing an efficient algorithm that constructs such one-shot summaries for k-clustering problems while retaining strong theoretical guarantees. View details
    Scalable k-Means Clustering via Lightweight Coresets
    Andreas Krause
    International Conference on Knowledge Discovery and Data Mining (2018)
    Preview abstract Coresets are compact representations of datasets such that models trained on a coreset are provably competitive with models trained on the full data set. As such, they have been successfully used to scale up clustering models to massive data sets. While existing approaches generally only allow for multiplicative approximation errors, we propose a novel notion of lightweight coresets that allows for both multiplicative and additive errors. We provide a single algorithm to construct lightweight coresets for k-Means clustering as well as soft and hard Bregman clustering. The algorithm is substantially faster than existing constructions, embarrassingly parallel and the resulting coresets are smaller. We demonstrate that the proposed method outperforms existing coreset constructions in practice. View details
    Preview abstract A variety of large-scale machine learning problems can be cast as instances of constrained submodular maximization. Existing approaches for distributed submodular maximization have a critical drawback: The capacity – the number of instances that can fit in memory of each machine – must grow with the data set size. In practice, while one can provision many machines, the capacity of each machine is limited by physical constraints. We propose a truly scalable approach for distributed submodular maximization when the capacity of each machine is fixed. The proposed framework applies to a broad class of algorithms and a variety of constraints. We provide theoretical guarantees on the approximation factor for any available capacity. We empirically evaluate the proposed algorithm on a variety of data sets and demonstrate that the algorithm achieves performance competitive with the centralized greedy solution. View details
    Distributed and Provably Good Seedings for k-Means in Constant Rounds
    Andreas Krause
    International Conference on Machine Learning (2017)
    Uniform Deviation Bounds for Unbounded Loss Functions like k-Means
    S. Hamed Hassani
    Andreas Krause
    International Conference on Machine Learning (2017)
    Fast and Provably Good Seedings for k-Means
    S. Hamed Hassani
    Andreas Krause
    Neural Information Processing Systems (2016)
    Approximate K-Means++ in Sublinear Time
    S. Hamed Hassani
    Andreas Krause
    AAAI Conference on Artificial Intelligence (2016)
    Linear-Time Outlier Detection via Sensitivity
    Andreas Krause
    International Joint Conference on Artificial Intelligence (2016)
    Strong Coresets for Hard and Soft Bregman Clustering with Applications to Exponential Family Mixtures
    Andreas Krause
    International Conference on Artificial Intelligence and Statistics (2016)
    Coresets for Nonparametric Estimation - the Case of DP-Means
    Andreas Krause
    International Conference on Machine Learning (2015)