Abstract
Randomly masking text spans in ordinary texts in the pre-training stage hardly allows models to acquire the ability to generate simple texts. It can hurt the performance of pre-trained models on text simplification tasks. In this paper, we propose a new continued pre-training strategy to teach the pre-trained model to generate simple texts. We continue pre-training BART, a representative model, to obtain SimpleBART. It consistently and significantly improves the results on lexical simplification, sentence simplification, and document-level simplification tasks over BART. At the end, we compare SimpleBART with several representative large language models (LLMs).- Anthology ID:
- 2023.findings-acl.595
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2023
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 9345–9355
- Language:
- URL:
- https://aclanthology.org/2023.findings-acl.595
- DOI:
- 10.18653/v1/2023.findings-acl.595
- Cite (ACL):
- Renliang Sun, Wei Xu, and Xiaojun Wan. 2023. Teaching the Pre-trained Model to Generate Simple Texts for Text Simplification. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9345–9355, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Teaching the Pre-trained Model to Generate Simple Texts for Text Simplification (Sun et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2023.findings-acl.595.pdf