Hallucinations in Neural Machine Translation
Abstract
Neural machine translation (NMT) systems have reached state of the art performance in translating text and are in wide deployment. Yet little is understood about how these systems function or how they break. Here we show that NMT systems are susceptible to producing highly pathological translations that are completely untethered from the source material, which we term {\it hallucinations}. Such pathological translations are problematic because they are are deeply disturbing of user trust and are easy to find with a simple search. We describe a method to generate hallucinations and show that many common variations of the NMT architecture are susceptible to them. We study a variety of approaches to reduce the frequency of hallucinations, including data augmentation, dynamical systems and regularization techniques, showing that a data augmentation technique significantly reduces hallucination frequency. Finally, we analyze networks that produce hallucinations and show that there are signatures in the attention matrix as well as in the stability measures of the decoder.