Multi-lingual and Cross-genre Discourse Unit Segmentation

Peter Bourgonje; Robin Schäfer

doi:10.18653/v1/W19-2714

Multi-lingual and Cross-genre Discourse Unit Segmentation

Abstract

We describe a series of experiments applied to data sets from different languages and genres annotated for coherence relations according to different theoretical frameworks. Specifically, we investigate the feasibility of a unified (theory-neutral) approach toward discourse segmentation; a process which divides a text into minimal discourse units that are involved in s coherence relation. We apply a RandomForest and an LSTM based approach for all data sets, and we improve over a simple baseline assuming simple sentence or clause-like segmentation. Performance however varies a lot depending on language, and more importantly genre, with f-scores ranging from 73.00 to 94.47.

Anthology ID:: W19-2714
Volume:: Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019
Month:: June
Year:: 2019
Address:: Minneapolis, MN
Editors:: Amir Zeldes, Debopam Das, Erick Maziero Galani, Juliano Desiderato Antonio, Mikel Iruskieta
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 105–114
Language:
URL:: https://preview.aclanthology.org/nschneid-patch-2/W19-2714/
DOI:: 10.18653/v1/W19-2714
Bibkey:
Cite (ACL):: Peter Bourgonje and Robin Schäfer. 2019. Multi-lingual and Cross-genre Discourse Unit Segmentation. In Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, pages 105–114, Minneapolis, MN. Association for Computational Linguistics.
Cite (Informal):: Multi-lingual and Cross-genre Discourse Unit Segmentation (Bourgonje & Schäfer, NAACL 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-2/W19-2714.pdf

PDF Cite Search Fix data