TURL: Table Understanding through Representation Learning

Xiang Deng
Huan Sun
Alyssa Whitlock Lees
Cong Yu
47th International Conference on Very Large Data Bases (2021)
Google Scholar

Abstract

Relational tables on the Web store a vast amount of knowledge. Owing to the wealth of such tables, various tasks on table understanding has made tremendous progress. However, existing work largely replies on heavily-engineered task-specific features and model architectures. In this paper, we present TURL, a novel framework that introduces the pre-training-fine-tuning paradigm to relational Web tables. During pre-training, it learns deep contextualized representations on relational tables in an unsupervised manner. Its universal model design along with the pre-trained representations can be applied to a wide range of tasks with minimal task-specific fine-tuning. More specifically, we propose a structure-aware Transformer encoder to model the row-column structure of relational tables, and present a new Masked Entity Recovery (MER) objective for pre-training to capture the semantics and knowledge in large-scale unlabeled data. To systematically evaluate TURL, we compile a benchmark that consists of 6 different tasks for table understanding (e.g., relation extraction, cell filling). We show that TURL generalizes well to all these tasks and substantially outperforms existing methods in most cases. Our source code, benchmark,as well as pre-trained models are available online to facilitate future research.