Jump to Content

Unsupervised Natural Language Generation with Denoising Autoencoders

Scott Roy
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2018), pp. 3922-3929

Abstract

Generating text from structured data is important for various tasks such as question answering and dialog systems. The task of Natural Language Generation (NLG) is to generate fluent sentences including all of the information given by some structured data. We show that without any supervision and only based on unlabeled text, we are able to build a NLG system with similar performance compared to supervised approaches. In our approach, we treat the structured data as a corrupt representation of the desired output and use a denoising auto-encoder to reconstruct the sentence. We show how to introduce noise into the training data to build a denoising auto-encoder that is able to generate correct sentences out of structured data. Further, by using bilingual out-of-domain data, we show how to train an unsupervised NLG system that can generate sentences in different languages within one network.