Language models are poor learners of directional inference

Tianyi Li
Sabine Weber
Mark Steedman
Findings of the Association for Computational Linguistics: EMNLP 2022, pp. 903-921

Abstract

We examine the RoBERTa LMs' competence of directional predicate entailments with prompt fine-tuning. Through analysis, we find that contrary to previous evidence of success, they are showing limited capabilities of directional inference; moreover, existing datasets are either ignorant of directionality, or infested by spurious correlations, allowing models to overfit to dataset artefacts. In response, we present BoOQA (Boolean Open QA), an extrinsic, robust, multi-lingual evaluation benchmark for directional predicate entailments, independent of existing training sets. On BoOQA, we establish baselines and verify that existing LM-prompting models are not competent directional entailment learners, while entailment graphs are cursed by sparsity. We bring the open problem of directional predicate entailment to spotlight and advocate for research along this line.