An Empirical Study on Topic Preservation in Multi-Document Summarization

Mong Yuan Sim, Wei Emma Zhang, Congbo Ma


Abstract
Multi-document summarization (MDS) is a process of generating an informative and concise summary from multiple topic-related documents. Many studies have analyzed the quality of MDS dataset or models, however no work has been done from the perspective of topic preservation. In this work, we fill the gap by performing an empirical analysis on two MDS datasets and study topic preservation on generated summaries from 8 MDS models.Our key findings include i) Multi-News dataset has better gold summaries compared to Multi-XScience in terms of its topic distribution consistency and ii) Extractive approaches perform better than abstractive approaches in preserving topic information from source documents. We hope our findings could help develop a summarization model that can generate topic-focused summary and also give inspiration to researchers in creating dataset for such challenging task.
Anthology ID:
2022.aacl-srw.9
Volume:
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Student Research Workshop
Month:
November
Year:
2022
Address:
Online
Venues:
AACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
61–67
Language:
URL:
https://aclanthology.org/2022.aacl-srw.9
DOI:
Bibkey:
Cite (ACL):
Mong Yuan Sim, Wei Emma Zhang, and Congbo Ma. 2022. An Empirical Study on Topic Preservation in Multi-Document Summarization. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Student Research Workshop, pages 61–67, Online. Association for Computational Linguistics.
Cite (Informal):
An Empirical Study on Topic Preservation in Multi-Document Summarization (Sim et al., AACL-IJCNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.aacl-srw.9.pdf