Abstract
Semantic Overlap Summarization (SOS) is a novel and relatively under-explored seq-to-seq task which entails summarizing common information from multiple alternate narratives. One of the major challenges for solving this task is the lack of existing datasets for supervised training. To address this challenge, we propose a novel data augmentation technique, which allows us to create large amount of synthetic data for training a seq-to-seq model that can perform the SOS task. Through extensive experiments using narratives from the news domain, we show that the models fine-tuned using the synthetic dataset provide significant performance improvements over the pre-trained vanilla summarization techniques and are close to the models fine-tuned on the golden training data; which essentially demonstrates the effectiveness of out proposed data augmentation technique for training seq-to-seq models on the SOS task.- Anthology ID:
- 2022.emnlp-main.807
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 11765–11777
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-main.807
- DOI:
- 10.18653/v1/2022.emnlp-main.807
- Cite (ACL):
- Naman Bansal, Mousumi Akter, and Shubhra Kanti Karmaker Santu. 2022. Learning to Generate Overlap Summaries through Noisy Synthetic Data. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11765–11777, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- Learning to Generate Overlap Summaries through Noisy Synthetic Data (Bansal et al., EMNLP 2022)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2022.emnlp-main.807.pdf