Delexicalised Multilingual Discourse Segmentation for DISRPT 2021 and Tense, Mood, Voice and Modality Tagging for 11 Languages

Tillmann Dönicke


Abstract
This paper describes our participating system for the Shared Task on Discourse Segmentation and Connective Identification across Formalisms and Languages. Key features of the presented approach are the formulation as a clause-level classification task, a language-independent feature inventory based on Universal Dependencies grammar, and composite-verb-form analysis. The achieved F1 is 92% for German and English and lower for other languages. The paper also presents a clause-level tagger for grammatical tense, aspect, mood, voice and modality in 11 languages.
Anthology ID:
2021.disrpt-1.4
Volume:
Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021)
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venues:
DISRPT | EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
33–45
Language:
URL:
https://aclanthology.org/2021.disrpt-1.4
DOI:
10.18653/v1/2021.disrpt-1.4
Bibkey:
Cite (ACL):
Tillmann Dönicke. 2021. Delexicalised Multilingual Discourse Segmentation for DISRPT 2021 and Tense, Mood, Voice and Modality Tagging for 11 Languages. In Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021), pages 33–45, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Delexicalised Multilingual Discourse Segmentation for DISRPT 2021 and Tense, Mood, Voice and Modality Tagging for 11 Languages (Dönicke, DISRPT 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2021.disrpt-1.4.pdf
Data
DISRPT2021Universal Dependencies