Controlled Hallucinations:Learning to Generate Faithfully from Noisy Data

Katja Filippova

Controlled Hallucinations:Learning to Generate Faithfully from Noisy Data

Katja Filippova

Findings of EMNLP 2020

Download Google Scholar

Abstract

Neural text generation (data- or text-to-text) demonstrates remarkable performance when training data is abundant which for many applications is not the case. To collect a large corpus of parallel data, heuristic rules are often used but they inevitably let noise into the data,
such as phrases in the output which cannot be explained by the input. Consequently, models pick up on the noise and may hallucinate–generate fluent but unsupported text.
Our contribution is a simple but powerful technique to control such hallucinations without dismissing any input and without modifying the model architecture. On the WikiBio corpus (Lebret et al., 2016), a particularly noisy dataset, we demonstrate the efficacy of the technique both
in an automatic and in a human evaluation.

Research Areas

Natural language processing

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Controlled Hallucinations:Learning to Generate Faithfully from Noisy Data

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs