Quantifying Appropriateness of Summarization Data for Curriculum Learning
Ryuji Kano, Takumi Takahashi, Toru Nishino, Motoki Taniguchi, Tomoki Taniguchi, Tomoko Ohkuma
Abstract
Much research has reported the training data of summarization models are noisy; summaries often do not reflect what is written in the source texts. We propose an effective method of curriculum learning to train summarization models from such noisy data. Curriculum learning is used to train sequence-to-sequence models with noisy data. In translation tasks, previous research quantified noise of the training data using two models trained with noisy and clean corpora. Because such corpora do not exist in summarization fields, we propose a model that can quantify noise from a single noisy corpus. We conduct experiments on three summarization models; one pretrained model and two non-pretrained models, and verify our method improves the performance. Furthermore, we analyze how different curricula affect the performance of pretrained and non-pretrained summarization models. Our result on human evaluation also shows our method improves the performance of summarization models.- Anthology ID:
- 2021.eacl-main.119
- Volume:
- Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
- Month:
- April
- Year:
- 2021
- Address:
- Online
- Editors:
- Paola Merlo, Jorg Tiedemann, Reut Tsarfaty
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1395–1405
- Language:
- URL:
- https://aclanthology.org/2021.eacl-main.119
- DOI:
- 10.18653/v1/2021.eacl-main.119
- Cite (ACL):
- Ryuji Kano, Takumi Takahashi, Toru Nishino, Motoki Taniguchi, Tomoki Taniguchi, and Tomoko Ohkuma. 2021. Quantifying Appropriateness of Summarization Data for Curriculum Learning. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1395–1405, Online. Association for Computational Linguistics.
- Cite (Informal):
- Quantifying Appropriateness of Summarization Data for Curriculum Learning (Kano et al., EACL 2021)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2021.eacl-main.119.pdf
- Data
- Reddit TIFU