PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition

Sihao Chen

Senaka Buthpitiya

Alex Fabrikant

Dan Roth

Tal Schuster

Findings of ACL 2023

Download Google Scholar

Abstract

The widely studied task of Natural Language Inference (NLI) requires a system to recognize whether one piece of text is textually entailed by another, i.e. whether the entirety of its meaning can be inferred from the other. In current NLI corpora and models, the textual entailment relation is typically defined on the sentence- or paragraph- level. However, even a simple sentence often contains multiple propositions, i.e. distinct units of meaning conveyed by the sentence. These propositions can carry different truth values in the context of a given premise, and we argue for the need to identify such fine-grained textual entailment relations. To facilitate the study on proposition-level segmentation and entailment, we propose PropSegmEnt, a corpus of over 35K propositions annotated by trained expert annotators. Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document, i.e. documents describing the same event or entity. We establish strong baselines for the segmentation and entailment tasks. We demonstrate that our conceptual framework is potentially useful for understanding and explaining the compositionality of NLI labels.

Research Areas

Natural Language Processing

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities