# Robert Dadashi

### Research Areas

Authored Publications

Google Publications

Other Publications

Sort By

Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback

Paul Roit

Johan Ferret

Geoffrey Cideron

Matthieu Geist

Sertan Girgin

Léonard Hussenot

Nikola Momchev

Piotr Stanczyk

Nino Vieillard

Olivier Pietquin

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (2023), 6252–6272

Preview abstract
Despite the seeming success of contemporary grounded text generation systems, they often tend to generate factually inconsistent text with respect to their input. This phenomenon is emphasized in tasks like summarization, in which the generated summaries should be corroborated by their source article. In this work we leverage recent progress on textual entailment models to directly address this problem for abstractive summarization systems. We use reinforcement learning with reference-free, textual-entailment rewards to optimize for factual consistency and explore the ensuing trade-offs, as improved consistency may come at the cost of less informative or more extractive summaries. Our results, according to both automatic metrics and human evaluation, show that our method considerably improves the faithfulness, salience and conciseness of the generated summaries.
View details

Offline Reinforcement Learning with On-Policy Q-Function Regularization

Laixi Shi

Yuejie Chi

Matthieu Geist

European Conference on Machine Learning (ECML) (2023)

Preview abstract
The core challenge of offline reinforcement learning (RL) is dealing with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. A large portion of prior work tackles this challenge by implicitly/explicitly regularizing the learning policy towards the behavior policy, which
is hard to estimate reliably in practice. In this work, we propose to regularize towards the Q-function of the behavior policy instead of the behavior policy itself, under the premise that the Q-function can be estimated more reliably and easily by a SARSA-style estimate and handles the extrapolation error more straightforwardly. We propose two algorithms taking advantage of the estimated Q-function through regularizations, and demonstrate they exhibit strong performance on the D4RL benchmarks.
View details

Offline Reinforcement Learning as Anti-Exploration

Shideh Rezaeifar

Nino Vieillard

Léonard Hussenot

Olivier Pietquin

Matthieu Geist

AAAI (2022)

Preview abstract
Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed dataset, without interactions with the system. An agent in this setting should avoid selecting actions whose consequences cannot be predicted from the data. This is the converse of exploration in RL, which favors such actions. We thus take inspiration from the literature on bonus-based exploration to design a new offline RL agent. The core idea is to subtract a prediction-based exploration bonus from the reward instead of adding it for exploration. This allows the policy to stay close to the support of the dataset. We connect this approach to a more usual regularization of the learnt policy towards the data. Instantiated with a bonus based on the prediction error of a variational autoencoder, we show that our agent is competitive with the state of the art on a set of continuous control locomotion and manipulation tasks.
View details

Continuous Control with Action Quantization from Demonstrations

Léonard Hussenot

Damien Vincent

Sertan Girgin

Matthieu Geist

Olivier Pietquin

International Conference on Machine Learning (ICML) (2022)

Preview abstract
In this paper, we propose a novel Reinforcement Learning (RL) framework for problems with continuous action spaces: Action Quantization from Demonstrations (AQuaDem). The proposed approach consists in learning a discretization of continuous action spaces from human demonstrations. This discretization returns a set of plausible actions (in light of the demonstrations) for each input state, thus capturing the priors of the demonstrator and their multimodal behavior. By discretizing the action space, any discrete action deep RL technique can be readily applied to the continuous control problem. Experiments show that the proposed approach outperforms state-of-the-art methods such as SAC in the RL setup, and GAIL in the Imitation Learning setup. We provide a website with interactive videos: https://google-research.github.io/aquadem/ and make the code available: https://github.com/google-research/google-research/tree/master/aquadem.
View details

Learning Energy Networks with Generalized Fenchel-Young Losses

Felipe Llinares

Léonard Hussenot

Matthieu Geist

Neural Information Processing Systems (NeurIPS) (2022)

Preview abstract
Energy-based models, a.k.a.\ energy networks, perform inference by optimizing
an energy function, typically parametrized by a neural network.
This allows one to capture potentially complex relationships between inputs and
outputs.
To learn the parameters of the energy function, the solution to that
optimization problem is typically fed into a loss function.
The key challenge for training energy networks lies in computing loss gradients,
as this typically requires argmin/argmax differentiation.
In this paper, building upon a generalized notion of conjugate function,
which replaces the usual bilinear pairing with a general energy function,
we propose generalized Fenchel-Young losses, a natural loss construction for
learning energy networks. Our losses enjoy many desirable properties and their
gradients can be computed efficiently without argmin/argmax differentiation.
We also prove the calibration of their excess risk in the case of linear-concave
energies. We demonstrate our losses on multilabel classification and
imitation learning tasks.
View details

What Matters for Adversarial Imitation Learning?

Manu Orsini

Léonard Hussenot

Damien Vincent

Sertan Girgin

Matthieu Geist

Olivier Pietquin

Marcin Andrychowicz

NeurIPS (2021)

Preview abstract
Adversarial imitation learning has become a standard framework for imitation in continuous control. Over the years, several variations of its components were proposed to enhance the performance of the learned policies as well as the sample complexity of the algorithm. In practice, many of these choices are rarely tested all together in rigorous empirical studies. It is therefore difficult to discuss and understand what choices, among the high-level algorithmic options as well as low-level implementation details, matter.
To tackle this issue, we implement more than 50 of these choices in a generic adversarial imitation learning framework and investigate their impacts in a large-scale study (>500k trained agents) with both synthetic and human-generated demonstrations. We analyze the key results and highlight the most surprising findings.
View details

Offline Reinforcement Learning with Pseudometric Learning

Shideh Rezaeifar

Nino Vieillard

Léonard Hussenot

Olivier Pietquin

Matthieu Geist

ICML (2021)

Preview abstract
Offline Reinforcement Learning methods seek to learn a policy from logged transitions of an environment, without any interaction. In the presence of function approximation, and under the assumption of limited coverage of the state-action space of the environment, it is necessary to enforce the policy to visit state-action pairs "close" to the support of logged transitions. In this work, we propose an iterative procedure to learn a pseudometric from logged transitions, and use it to define this notion of closeness. We show its convergence guarantees and extend it to the sampled function approximation setting. We then use this pseudometric to define a new look-up based malus in an actor-critic algorithm: this encourages the actor to stay close, in terms of the defined pseudometric, to the support of logged transitions. Finally, we evaluate the method against hand manipulation and locomotion tasks.
View details

Hyperparameter Selection for Imitation Learning

Léonard Hussenot

Marcin Andrychowicz

Damien Vincent

Lukasz Piotr Stafiniak

Sertan Girgin

Nikola M Momchev

Manu Orsini

Matthieu Geist

Olivier Pietquin

ICML (2021)

Preview abstract
We address the issue of tuning hyperparameters (HPs) for imitation learning algorithms when the underlying reward function of the demonstrating expert cannot be observed at any time. The vast literature in imitation learning mostly considers this reward function to be available for HP selection, although this is not a realistic setting. Indeed, would this reward function be available, it should then directly be used for policy training and imitation would not make sense. To tackle this mostly ignored problem, we propose and study, for different representative agents and benchmarks, a number of possible proxies to the return, within an extensive empirical study. We observe that, depending on the algorithm and the environment, some methods allow good performance to be achieved without using the unknown return.
View details

Statistics and Samples in Distributional Reinforcement Learning

Preview
Mark Rowland

Saurabh Kumar

Remi Munos

Marc Bellemare

Will Dabney

Proceedings of the 36th International Conference on Machine Learning, ICML (2019), pp. 5528-5536

The Value Function Polytope in Reinforcement Learning

Adrien Ali Taiga

Nicolas Le Roux

Marc G. Bellemare

Proceedings of the 36th International Conference on Machine Learning, ICML (2019), pp. 1486-1495 (to appear)

Preview abstract
We establish geometric and topological properties of the space of value functions in finite state-action Markov decision processes. Our main contribution is the characterization of the nature of its shape: a general polytope \cite{aigner2010proofs}. To demonstrate this result, we exhibit several properties of the structural relationship between policies and value functions including the line theorem, which shows that the value functions of policies constrained on all but one state describe a line segment. Finally, we use this novel perspective and introduce visualizations to enhance the understanding of the dynamics of reinforcement learning algorithms.
View details

A Geometric Perspective on Optimal Representations for Reinforcement Learning

Marc G. Bellemare

Will Dabney

Adrien Ali Taïga

Nicolas Le Roux

Tor Lattimore

Clare Lyle

NeurIPS (2019)

Preview abstract
We propose a new perspective on representation learning in reinforcement learning based on geometric properties of the space of value functions. We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks. Our formulation considers adapting the representation to minimize the (linear) approximation of the value function of all stationary policies for a given environment. We show that this optimization reduces to making accurate predictions regarding a special class of value functions which we call adversarial value functions (AVFs). We demonstrate that using value functions as auxiliary tasks corresponds to an expected-error relaxation of our formulation, with AVFs a natural candidate, and identify a close relationship with proto-value functions (Mahadevan, 2005). We highlight characteristics of AVFs and their usefulness as auxiliary tasks in a series of experiments on the four-room domain.
View details

No Results Found