Disagreements in analyses of rhetorical text structure: A new dataset and first analyses

Freya Hewett, Manfred Stede


Abstract
Discourse structure annotation is known to involve a high level of subjectivity, which often results in low inter-annotator agreement. In this paper, we focus on “legitimate disagreements”, by which we refer to multiple valid annotations for a text or text segment. We provide a new dataset of English and German texts, where each text comes with two parallel analyses (both done by well-trained annotators) in the framework of Rhetorical Structure Theory. Using the RST Tace tool, we build a list of all conflicting annotation decisions and present some statistics for the corpus. Thereafter, we undertake a qualitative analysis of the disagreements and propose a typology of underlying reasons. From this we derive the need to differentiate two kinds of ambiguities in RST annotation: those that result from inherent “everyday” linguistic ambiguity, and those that arise from specifications in the theory and/or the annotation schemes.
Anthology ID:
2025.law-1.3
Volume:
Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Siyao Peng, Ines Rehbein
Venues:
LAW | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
35–47
Language:
URL:
https://preview.aclanthology.org/display_plenaries/2025.law-1.3/
DOI:
Bibkey:
Cite (ACL):
Freya Hewett and Manfred Stede. 2025. Disagreements in analyses of rhetorical text structure: A new dataset and first analyses. In Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025), pages 35–47, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Disagreements in analyses of rhetorical text structure: A new dataset and first analyses (Hewett & Stede, LAW 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/display_plenaries/2025.law-1.3.pdf