An Alignment-based Approach to Text Segmentation Similarity Scoring

Gerardo Ocampo Diaz, Jessica Ouyang


Abstract
Text segmentation is a natural language processing task with popular applications, such as topic segmentation, element discourse extraction, and sentence tokenization. Much work has been done to develop accurate segmentation similarity metrics, but even the most advanced metrics used today, B, and WindowDiff, exhibit incorrect behavior due to their evaluation of boundaries in isolation. In this paper, we present a new segment-alignment based approach to segmentation similarity scoring and a new similarity metric A. We show that A does not exhibit the erratic behavior of $ and WindowDiff, quantify the likelihood of B and WindowDiff misbehaving through simulation, and discuss the versatility of alignment-based approaches for segmentation similarity scoring. We make our implementation of A publicly available and encourage the community to explore more sophisticated approaches to text segmentation similarity scoring.
Anthology ID:
2022.conll-1.26
Volume:
Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Antske Fokkens, Vivek Srikumar
Venue:
CoNLL
SIG:
SIGNLL
Publisher:
Association for Computational Linguistics
Note:
Pages:
374–383
Language:
URL:
https://aclanthology.org/2022.conll-1.26
DOI:
10.18653/v1/2022.conll-1.26
Bibkey:
Cite (ACL):
Gerardo Ocampo Diaz and Jessica Ouyang. 2022. An Alignment-based Approach to Text Segmentation Similarity Scoring. In Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL), pages 374–383, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
An Alignment-based Approach to Text Segmentation Similarity Scoring (Ocampo Diaz & Ouyang, CoNLL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/2022.conll-1.26.pdf