Joint Learning of Syntactic Features Helps Discourse Segmentation

Takshak Desai; Parag Pravin Dakle; Dan Moldovan

Joint Learning of Syntactic Features Helps Discourse Segmentation

Takshak Desai, Parag Pravin Dakle, Dan Moldovan

Abstract

This paper describes an accurate framework for carrying out multi-lingual discourse segmentation with BERT (Devlin et al., 2019). The model is trained to identify segments by casting the problem as a token classification problem and jointly learning syntactic features like part-of-speech tags and dependency relations. This leads to significant improvements in performance. Experiments are performed in different languages, such as English, Dutch, German, Portuguese Brazilian and Basque to highlight the cross-lingual effectiveness of the segmenter. In particular, the model achieves a state-of-the-art F-score of 96.7 for the RST-DT corpus (Carlson et al., 2003) improving on the previous best model by 7.2%. Additionally, a qualitative explanation is provided for how proposed changes contribute to model performance by analyzing errors made on the test data.

Anthology ID:: 2020.lrec-1.135
Volume:: Proceedings of the 12th Language Resources and Evaluation Conference
Month:: May
Year:: 2020
Address:: Marseille, France
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 1073–1080
Language:: English
URL:: https://aclanthology.org/2020.lrec-1.135
DOI:
Bibkey:
Cite (ACL):: Takshak Desai, Parag Pravin Dakle, and Dan Moldovan. 2020. Joint Learning of Syntactic Features Helps Discourse Segmentation. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 1073–1080, Marseille, France. European Language Resources Association.
Cite (Informal):: Joint Learning of Syntactic Features Helps Discourse Segmentation (Desai et al., LREC 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/update-css-js/2020.lrec-1.135.pdf
Code: takshakpdesai/discourse-segmenter

PDF Cite Search Code