Learning to Generate Overlap Summaries through Noisy Synthetic Data

Naman Bansal, Mousumi Akter, Shubhra Kanti Karmaker Santu


Abstract
Semantic Overlap Summarization (SOS) is a novel and relatively under-explored seq-to-seq task which entails summarizing common information from multiple alternate narratives. One of the major challenges for solving this task is the lack of existing datasets for supervised training. To address this challenge, we propose a novel data augmentation technique, which allows us to create large amount of synthetic data for training a seq-to-seq model that can perform the SOS task. Through extensive experiments using narratives from the news domain, we show that the models fine-tuned using the synthetic dataset provide significant performance improvements over the pre-trained vanilla summarization techniques and are close to the models fine-tuned on the golden training data; which essentially demonstrates the effectiveness of out proposed data augmentation technique for training seq-to-seq models on the SOS task.
Anthology ID:
2022.emnlp-main.807
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11765–11777
Language:
URL:
https://aclanthology.org/2022.emnlp-main.807
DOI:
10.18653/v1/2022.emnlp-main.807
Bibkey:
Cite (ACL):
Naman Bansal, Mousumi Akter, and Shubhra Kanti Karmaker Santu. 2022. Learning to Generate Overlap Summaries through Noisy Synthetic Data. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11765–11777, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Learning to Generate Overlap Summaries through Noisy Synthetic Data (Bansal et al., EMNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2022.emnlp-main.807.pdf