Google Research

TeaForN: Teacher-Forcing with N-grams


This paper introduces TeaForN, an extension of the teacher-forcing method to N-grams. Sequence generation models trained with teacher-forcing suffer from problems such as exposure bias and lack of differentiability across timesteps. TeaForN addresses both these problems directly, through the use of a stack of N decoders trained to decode along a secondary time axis that allows model-parameter updates based on N prediction steps. Unlike other approaches, TeaForN can be used with a wide class of decoder architectures and requires minimal modifications from a standard teacher-forcing setup. Empirically, we show that TeaForN boosts model quality and beam-efficiency against several sequence generation benchmarks.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work