PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition

Sihao Chen, Senaka Buthpitiya, Alex Fabrikant, Dan Roth, Tal Schuster


Abstract
The widely studied task of Natural Language Inference (NLI) requires a system to recognize whether one piece of text is textually entailed by another, i.e. whether the entirety of its meaning can be inferred from the other. In current NLI datasets and models, textual entailment relations are typically defined on the sentence- or paragraph-level. However, even a simple sentence often contains multiple propositions, i.e. distinct units of meaning conveyed by the sentence. As these propositions can carry different truth values in the context of a given premise, we argue for the need to recognize the textual entailment relation of each proposition in a sentence individually. We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert human raters. Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document, i.e. documents describing the same event or entity. We establish strong baselines for the segmentation and entailment tasks. Through case studies on summary hallucination detection and document-level NLI, we demonstrate that our conceptual framework is potentially useful for understanding and explaining the compositionality of NLI labels.
Anthology ID:
2023.findings-acl.565
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8874–8893
Language:
URL:
https://aclanthology.org/2023.findings-acl.565
DOI:
10.18653/v1/2023.findings-acl.565
Bibkey:
Cite (ACL):
Sihao Chen, Senaka Buthpitiya, Alex Fabrikant, Dan Roth, and Tal Schuster. 2023. PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8874–8893, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition (Chen et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/2023.findings-acl.565.pdf
Video:
 https://preview.aclanthology.org/dois-2013-emnlp/2023.findings-acl.565.mp4