Translate & Fill: Improving Zero-Shot Multilingual Semantic Parsing by Generating Synthetic Data
Abstract
While multilingual pretrained language models (LMs) fine-tuned on a single language have shown substantial cross-lingual task transfer capabilities, there is still a wide performance gap in semantic parsing tasks when target language supervision is available. In this paper, we propose a novel Translate-and-Fill (TaF) method for producing silver training data for a multilingual semantic parser. This method simplifies the popular Translate-Align-Project (TAP) pipeline and consists of a sequence-to-sequence filler model that constructs a full parse conditioned on an utterance and a view of the same parse. Our filler is trained on English data only but can accurately complete instances in other languages (i.e., translations of the English training utterances), in a zero-shot fashion. Experimental results on multiple multilingual semantic parsing datasets show that high-capacity multilingual pretrained LMs have remarkable zero-shot performance and with the help of our synthetic data, they reach competitive accuracy compared to similar systems which rely on traditional alignment techniques.