Jump to Content

TeaForN: Teacher-Forcing with N-grams

Sebastian Alexander Goodman
EMNLP 2020
Google Scholar


This paper introduces TeaForN, an extension of the teacher-forcing method to N-grams. Sequence generation models trained with teacher-forcing suffer from problems such as exposure bias and lack of differentiability across timesteps. TeaForN addresses both these problems directly, through the use of a stack of N decoders trained to decode along a secondary time axis that allows model-parameter updates based on N prediction steps. Unlike other approaches, TeaForN can be used with a wide class of decoder architectures and requires minimal modifications from a standard teacher-forcing setup. Empirically, we show that TeaForN boosts model quality and beam-efficiency against several sequence generation benchmarks.