Table-To-Text generation and pre-training with TabT5

Abstract

Encoder-only transformer models have been successfully applied to different table understanding tasks, as in TAPAS (Herzig et al., 2020). A major limitation of these architectures is that they are constrained to classification-like tasks such as cell selection or entailment detection. We present TABT5, an encoder-decoder model that generates natural language text based on tables and textual inputs. TABT5, overcomes the encoder-only limitation by incorporating a decoder component and leverages the input structure with table specific embeddings as well as pre-training. TABT5 achieves new state-of-the-art results on several domains, including spreadsheet formula prediction (15% increase in sequence accuracy), question answering (10% increase in sequence accuracy) and data-to-text generation (2% increas in BLEU).