Abstract
The first step in discourse analysis involves dividing a text into segments. We annotate the first high-quality small-scale medical corpus in English with discourse segments and analyze how well news-trained segmenters perform on this domain. While we expectedly find a drop in performance, the nature of the segmentation errors suggests some problems can be addressed earlier in the pipeline, while others would require expanding the corpus to a trainable size to learn the nuances of the medical domain.- Anthology ID:
- W19-2704
- Volume:
- Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, MN
- Editors:
- Amir Zeldes, Debopam Das, Erick Maziero Galani, Juliano Desiderato Antonio, Mikel Iruskieta
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 22–29
- Language:
- URL:
- https://aclanthology.org/W19-2704
- DOI:
- 10.18653/v1/W19-2704
- Cite (ACL):
- Elisa Ferracane, Titan Page, Junyi Jessy Li, and Katrin Erk. 2019. From News to Medical: Cross-domain Discourse Segmentation. In Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, pages 22–29, Minneapolis, MN. Association for Computational Linguistics.
- Cite (Informal):
- From News to Medical: Cross-domain Discourse Segmentation (Ferracane et al., NAACL 2019)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/W19-2704.pdf
- Code
- elisaF/news-med-segmentation