Data Selection Curriculum for Abstractive Text Summarization
Shichao Sun, Ruifeng Yuan, Jianfei He, Ziqiang Cao, Wenjie Li, Xiaohua Jia
Abstract
Abstractive Text Summarization (ATS) models are commonly trained using large-scale data that is randomly shuffled. However, the impact of data selection and data ordering on ATS models remains a relatively unexplored research area, where a significant challenge lies in accurately assessing the learning difficulty of each training instance. This study introduces a Data Selection Curriculum (DSC) scoring system that incorporates both the difficulty of improving ATS model via an instance and the expected performance on this instance. By selectively excluding excessively simple and overly complex instances, the training efficiency can be optimized. Furthermore, curriculum learning is integrated to accelerate convergence and improve performance by gradually increasing the learning difficulty, inspired by human learners. Experimental results on the CNN/DailyMail dataset demonstrate that our approach surpasses potent baselines, utilizing a mere 20% of the available instances.- Anthology ID:
- 2023.findings-emnlp.537
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 7990–7995
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.537
- DOI:
- 10.18653/v1/2023.findings-emnlp.537
- Cite (ACL):
- Shichao Sun, Ruifeng Yuan, Jianfei He, Ziqiang Cao, Wenjie Li, and Xiaohua Jia. 2023. Data Selection Curriculum for Abstractive Text Summarization. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7990–7995, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Data Selection Curriculum for Abstractive Text Summarization (Sun et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2023.findings-emnlp.537.pdf