Bilingual Rhetorical Structure Parsing with Large Parallel Annotations

Elena Chistova


Abstract
Discourse parsing is a crucial task in natural language processing that aims to reveal the higher-level relations in a text. Despite growing interest in cross-lingual discourse parsing, challenges persist due to limited parallel data and inconsistencies in the Rhetorical Structure Theory (RST) application across languages and corpora. To address this, we introduce a parallel Russian annotation for the large and diverse English GUM RST corpus. Leveraging recent advances, our end-to-end RST parser achieves state-of-the-art results on both English and Russian corpora. It demonstrates effectiveness in both monolingual and bilingual settings, successfully transferring even with limited second-language annotation. To the best of our knowledge, this work is the first to evaluate the potential of cross-lingual end-to-end RST parsing on a manually annotated parallel corpus.
Anthology ID:
2024.findings-acl.577
Volume:
Findings of the Association for Computational Linguistics: ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9689–9706
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2024.findings-acl.577/
DOI:
10.18653/v1/2024.findings-acl.577
Bibkey:
Cite (ACL):
Elena Chistova. 2024. Bilingual Rhetorical Structure Parsing with Large Parallel Annotations. In Findings of the Association for Computational Linguistics: ACL 2024, pages 9689–9706, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Bilingual Rhetorical Structure Parsing with Large Parallel Annotations (Chistova, Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2024.findings-acl.577.pdf