Neural Text Simplification in Low-Resource Conditions Using Weak Supervision
Alessio Palmero Aprosio, Sara Tonelli, Marco Turchi, Matteo Negri, Mattia A. Di Gangi
Abstract
Neural text simplification has gained increasing attention in the NLP community thanks to recent advancements in deep sequence-to-sequence learning. Most recent efforts with such a data-demanding paradigm have dealt with the English language, for which sizeable training datasets are currently available to deploy competitive models. Similar improvements on less resource-rich languages are conditioned either to intensive manual work to create training data, or to the design of effective automatic generation techniques to bypass the data acquisition bottleneck. Inspired by the machine translation field, in which synthetic parallel pairs generated from monolingual data yield significant improvements to neural models, in this paper we exploit large amounts of heterogeneous data to automatically select simple sentences, which are then used to create synthetic simplification pairs. We also evaluate other solutions, such as oversampling and the use of external word embeddings to be fed to the neural simplification system. Our approach is evaluated on Italian and Spanish, for which few thousand gold sentence pairs are available. The results show that these techniques yield performance improvements over a baseline sequence-to-sequence configuration.- Anthology ID:
- W19-2305
- Volume:
- Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, Minnesota
- Editors:
- Antoine Bosselut, Asli Celikyilmaz, Marjan Ghazvininejad, Srinivasan Iyer, Urvashi Khandelwal, Hannah Rashkin, Thomas Wolf
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 37–44
- Language:
- URL:
- https://aclanthology.org/W19-2305
- DOI:
- 10.18653/v1/W19-2305
- Cite (ACL):
- Alessio Palmero Aprosio, Sara Tonelli, Marco Turchi, Matteo Negri, and Mattia A. Di Gangi. 2019. Neural Text Simplification in Low-Resource Conditions Using Weak Supervision. In Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation, pages 37–44, Minneapolis, Minnesota. Association for Computational Linguistics.
- Cite (Informal):
- Neural Text Simplification in Low-Resource Conditions Using Weak Supervision (Palmero Aprosio et al., NAACL 2019)
- PDF:
- https://preview.aclanthology.org/naacl-24-ws-corrections/W19-2305.pdf
- Data
- Newsela