- Alice Martin
- Guillaume Quispe
- Charles Ollion
- Sylvain Le Corf
- Florian Strub
- Olivier Pietquin
Abstract
This paper introduces TRUncated ReinForcement Learning for Language (TrufLL), an original approach to train conditional language models from scratch by only using reinforcement learning (RL). As RL methods unsuccessfully scale to large action spaces, we dynamically truncate the vocabulary space using a generic language model. TrufLL thus en ables to train a language agent by solely interacting with its environment without any task-specific prior knowledge; it is only guided with a task-agnostic language model. Interestingly, this approach avoids the dependency to labelled datasets and inherently reduces pre-trained policy flaws such as language or exposure biases. We evaluate TrufLL on two visual question generation tasks, for which we report promising results over performance and language metrics. To our knowledge, it is the first approach that successfully learns a language generation policy (almost) from scratch
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work