Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling

Sebastian Hofstätter; Jiecao Chen; Karthik Raman; Hamed Zamani

Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling

Sebastian Hofstätter

Jiecao Chen

Karthik Raman

Hamed Zamani

ICML 2022 Workshop on Knowledge Retrieval and Language Models

Download Google Scholar

Abstract

This paper studies multi-task training of retrieval-augmented generation models for knowledge-intensive tasks. We propose to clean the training set by utilizing a distinct property of knowledge-intensive generation: The connection of query-answer pairs to items in the knowledge base. We filter training examples via a threshold of confidence on the relevance labels, whether a pair is answerable by the knowledge base or not.We train a single Fusion-in-Decoder (FiD) generator on seven combined tasks of the KILT benchmark. The experimental results suggest that our simple yet effective approach substantially improves competitive baselines on two strongly imbalanced tasks; and shows either smaller improvements or no significant regression on the remaining tasks. Furthermore, we demonstrate our multi-task training with relevance label sampling scales well with increased model capacity and achieves state-of-the-art results in five out of seven KILT tasks.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs