PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition

Sihao Chen; Senaka Buthpitiya; Alex Fabrikant; Dan Roth; Tal Schuster

PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition

Sihao Chen

Senaka Buthpitiya

Alex Fabrikant

Dan Roth

Tal Schuster

Findings of ACL 2023

Download Google Scholar

Abstract

The widely studied task of Natural Language Inference (NLI) requires a system to recognize whether one piece of text is textually entailed by another, i.e. whether the entirety of its meaning can be inferred from the other. In current NLI corpora and models, the textual entailment relation is typically defined on the sentence- or paragraph- level. However, even a simple sentence often contains multiple propositions, i.e. distinct units of meaning conveyed by the sentence. These propositions can carry different truth values in the context of a given premise, and we argue for the need to identify such fine-grained textual entailment relations.

To facilitate the study on proposition-level segmentation and entailment, we propose PropSegmEnt, a corpus of over 35K propositions annotated by trained expert annotators.
Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document, i.e. documents describing the same event or entity.
We establish strong baselines for the segmentation and entailment tasks.
We demonstrate that our conceptual framework is potentially useful for understanding and explaining the compositionality of NLI labels.

Research Areas

Natural language processing

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs