Projecting Annotations for Discourse Relations: Connective Identification for Low-Resource Languages

Peter Bourgonje, Pin-Jie Lin


Abstract
We present a pipeline for multi-lingual Shallow Discourse Parsing. The pipeline exploits Machine Translation and Word Alignment, by translating any incoming non-English input text into English, applying an English discourse parser, and projecting the found relations onto the original input text through word alignments. While the purpose of the pipeline is to provide rudimentary discourse relation annotations for low-resource languages, in order to get an idea of performance, we evaluate it on the sub-task of discourse connective identification for several languages for which gold data are available. We experiment with different setups of our modular pipeline architecture and analyze intermediate results. Our code is made available on GitHub.
Anthology ID:
2024.codi-1.4
Volume:
Proceedings of the 5th Workshop on Computational Approaches to Discourse (CODI 2024)
Month:
March
Year:
2024
Address:
St. Julians, Malta
Editors:
Michael Strube, Chloe Braud, Christian Hardmeier, Junyi Jessy Li, Sharid Loaiciga, Amir Zeldes, Chuyuan Li
Venues:
CODI | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
39–49
Language:
URL:
https://aclanthology.org/2024.codi-1.4
DOI:
Bibkey:
Cite (ACL):
Peter Bourgonje and Pin-Jie Lin. 2024. Projecting Annotations for Discourse Relations: Connective Identification for Low-Resource Languages. In Proceedings of the 5th Workshop on Computational Approaches to Discourse (CODI 2024), pages 39–49, St. Julians, Malta. Association for Computational Linguistics.
Cite (Informal):
Projecting Annotations for Discourse Relations: Connective Identification for Low-Resource Languages (Bourgonje & Lin, CODI-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2024.codi-1.4.pdf
Supplementary material:
 2024.codi-1.4.SupplementaryMaterial.zip