Google Research

Framework for Recasting Table-to-Text Generation Data for Tabular Inference

Findings of EMNLP (2022)

Abstract

Prior work on constructing challenging tabular inference data centered primarily on human annotation or automatic synthetic generation. Both techniques have their own set of issues. Human annotation, despite its diversity and superior reasoning, struggles from scaling concerns. Synthetic data, on the other hand, despite its scalability, suffers from lack of linguistic and reasoning diversity. In this paper, we address both of these concerns by presenting a recasting approach that semi-automatically generates tabular NLI instances. We transform the table2text dataset ToTTo (Parikh et al., 2020) into a tabular NLI dataset using our proposed framework. We demonstrate the use of our recasted data as an evaluation benchmark as well as augmentation data to improve performance on TabFact (Chen et al., 2020b). Furthermore, we test the effectiveness of models trained on our data on the TabFact benchmark in the zero-shot scenario.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work