- Ciprian Chelba
- Mia Chen
- Ankur Bapna
- Noam Shazeer
ArXiv, Google Research (2020)
Motivated by the fact that most of the information relevant to the prediction of target tokens is drawn from the source sentence S=s1,...,sS, we propose truncating the target-side context used for incremental predictions by making a Markov (N-gram) assumption. Experiments on WMT EnDe and EnFr data sets show that the N-gram masked self-attention model loses very little in BLEU score for N values in the range 4,...,8, depending on the task.
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work