Abstract
We describe a series of experiments applied to data sets from different languages and genres annotated for coherence relations according to different theoretical frameworks. Specifically, we investigate the feasibility of a unified (theory-neutral) approach toward discourse segmentation; a process which divides a text into minimal discourse units that are involved in s coherence relation. We apply a RandomForest and an LSTM based approach for all data sets, and we improve over a simple baseline assuming simple sentence or clause-like segmentation. Performance however varies a lot depending on language, and more importantly genre, with f-scores ranging from 73.00 to 94.47.- Anthology ID:
- W19-2714
- Volume:
- Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, MN
- Editors:
- Amir Zeldes, Debopam Das, Erick Maziero Galani, Juliano Desiderato Antonio, Mikel Iruskieta
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 105–114
- Language:
- URL:
- https://aclanthology.org/W19-2714
- DOI:
- 10.18653/v1/W19-2714
- Cite (ACL):
- Peter Bourgonje and Robin Schäfer. 2019. Multi-lingual and Cross-genre Discourse Unit Segmentation. In Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, pages 105–114, Minneapolis, MN. Association for Computational Linguistics.
- Cite (Informal):
- Multi-lingual and Cross-genre Discourse Unit Segmentation (Bourgonje & Schäfer, NAACL 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/W19-2714.pdf
- Data
- DISRPT2019