Predicting Discourse Trees from Transformer-based Neural Summarizers

Wen Xiao, Patrick Huber, Giuseppe Carenini


Abstract
Previous work indicates that discourse information benefits summarization. In this paper, we explore whether this synergy between discourse and summarization is bidirectional, by inferring document-level discourse trees from pre-trained neural summarizers. In particular, we generate unlabeled RST-style discourse trees from the self-attention matrices of the transformer model. Experiments across models and datasets reveal that the summarizer learns both, dependency- and constituency-style discourse information, which is typically encoded in a single head, covering long- and short-distance discourse dependencies. Overall, the experimental results suggest that the learned discourse information is general and transferable inter-domain.
Anthology ID:
2021.naacl-main.326
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4139–4152
Language:
URL:
https://aclanthology.org/2021.naacl-main.326
DOI:
10.18653/v1/2021.naacl-main.326
Bibkey:
Cite (ACL):
Wen Xiao, Patrick Huber, and Giuseppe Carenini. 2021. Predicting Discourse Trees from Transformer-based Neural Summarizers. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4139–4152, Online. Association for Computational Linguistics.
Cite (Informal):
Predicting Discourse Trees from Transformer-based Neural Summarizers (Xiao et al., NAACL 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2021.naacl-main.326.pdf
Video:
 https://preview.aclanthology.org/emnlp-22-attachments/2021.naacl-main.326.mp4
Code
 Wendy-Xiao/summ_guided_disco_parser
Data
CNN/Daily Mail