Jump to Content

q2d: Automatic Dialog Generation to Improve Models' Query Generation

Enav Weinreb
Ido Hakimi
Shlomi Cohen-Ganor
Yoad Lewenberg
EMNLP 2023 (2023)

Abstract

We propose q2d: an automatic data generation pipeline that generates information-seeking dialogues based on questions. We apply our method to create conversational versions of questions answering datasets, which we release as a new dataset. We use this data to improve query generation models, which communicate with an external search APIs to generate factual responses. Unlike previous approaches, which relied on human annotators, our method allows to automatically generate labeled dialogues with better control and scale. In experiments, we demonstrate that: (1) Models trained on our synthetic data produce results comparable to those trained on natural data; (2) Our generated datasets are effective as a benchmark and as a training signal that generalizes to human-annotated test sets. We also provide an extensive analysis of the quality and factuality of the generated datasets. Our studies indicate that our automatic dialogue generation pipeline is effective at improving query generation and factuality.