Improving Topic Segmentation by Injecting Discourse Dependencies

Linzi Xing, Patrick Huber, Giuseppe Carenini


Abstract
Recent neural supervised topic segmentation models achieve distinguished superior effectiveness over unsupervised methods, with the availability of large-scale training corpora sampled from Wikipedia. These models may, however, suffer from limited robustness and transferability caused by exploiting simple linguistic cues for prediction, but overlooking more important inter-sentential topical consistency. To address this issue, we present a discourse-aware neural topic segmentation model with the injection of above-sentence discourse dependency structures to encourage the model make topic boundary prediction based more on the topical consistency between sentences. Our empirical study on English evaluation datasets shows that injecting above-sentence discourse structures to a neural topic segmenter with our proposed strategy can substantially improve its performances on intra-domain and out-of-domain data, with little increase of model’s complexity.
Anthology ID:
2022.codi-1.2
Volume:
Proceedings of the 3rd Workshop on Computational Approaches to Discourse
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea and Online
Venue:
CODI
SIG:
Publisher:
International Conference on Computational Linguistics
Note:
Pages:
7–18
Language:
URL:
https://aclanthology.org/2022.codi-1.2
DOI:
Bibkey:
Cite (ACL):
Linzi Xing, Patrick Huber, and Giuseppe Carenini. 2022. Improving Topic Segmentation by Injecting Discourse Dependencies. In Proceedings of the 3rd Workshop on Computational Approaches to Discourse, pages 7–18, Gyeongju, Republic of Korea and Online. International Conference on Computational Linguistics.
Cite (Informal):
Improving Topic Segmentation by Injecting Discourse Dependencies (Xing et al., CODI 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2022.codi-1.2.pdf
Data
WikiSection