Natural Language Generation (Almost) from Scratch with Truncated Reinforcement Learning

Alice Martin
Guillaume Quispe
Charles Ollion
Sylvain Le Corf
Florian Strub
Olivier Pietquin
Proc. of AAAI 2022


This paper introduces TRUncated ReinForcement Learning for Language (TrufLL), an original approach to train conditional language models from scratch by only using reinforcement learning (RL). As RL methods unsuccessfully scale to large action spaces, we dynamically truncate the vocabulary space using a generic language model. TrufLL thus en ables to train a language agent by solely interacting with its environment without any task-specific prior knowledge; it is only guided with a task-agnostic language model. Interestingly, this approach avoids the dependency to labelled datasets and inherently reduces pre-trained policy flaws such as language or exposure biases. We evaluate TrufLL on two visual question generation tasks, for which we report promising results over performance and language metrics. To our knowledge, it is the first approach that successfully learns a language generation policy (almost) from scratch