Abstract
Discourse segmentation and sentence-level discourse parsing play important roles for various NLP tasks to consider textual coherence. Despite recent achievements in both tasks, there is still room for improvement due to the scarcity of labeled data. To solve the problem, we propose a language model-based generative classifier (LMGC) for using more information from labels by treating the labels as an input while enhancing label representations by embedding descriptions for each label. Moreover, since this enables LMGC to make ready the representations for labels, unseen in the pre-training step, we can effectively use a pre-trained language model in LMGC. Experimental results on the RST-DT dataset show that our LMGC achieved the state-of-the-art F1 score of 96.72 in discourse segmentation. It further achieved the state-of-the-art relation F1 scores of 84.69 with gold EDU boundaries and 81.18 with automatically segmented boundaries, respectively, in sentence-level discourse parsing.- Anthology ID:
- 2021.emnlp-main.188
- Volume:
- Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2432–2446
- Language:
- URL:
- https://aclanthology.org/2021.emnlp-main.188
- DOI:
- 10.18653/v1/2021.emnlp-main.188
- Cite (ACL):
- Ying Zhang, Hidetaka Kamigaito, and Manabu Okumura. 2021. A Language Model-based Generative Classifier for Sentence-level Discourse Parsing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2432–2446, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- A Language Model-based Generative Classifier for Sentence-level Discourse Parsing (Zhang et al., EMNLP 2021)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2021.emnlp-main.188.pdf