Abstract
Automatic text simplification (ATS) describes the automatic transformation of a text from a complex form to a less complex form. Many modern ATS techniques need large parallel corpora of standard and simplified text, but such data does not exist for many languages. One way to overcome this issue is to create pseudo-parallel corpora by dividing existing corpora into standard and simple parts. In this work, we explore the creation of Swedish pseudo-parallel monolingual corpora by the application of different feature representation methods, sentence alignment algorithms, and indexing approaches, on a large monolingual corpus. The different corpora are used to fine-tune a sentence simplification system based on BART, which is evaluated with standard evaluation metrics for automatic text simplification.- Anthology ID:
- 2023.nodalida-1.13
- Volume:
- Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
- Month:
- May
- Year:
- 2023
- Address:
- Tórshavn, Faroe Islands
- Editors:
- Tanel Alumäe, Mark Fishel
- Venue:
- NoDaLiDa
- SIG:
- Publisher:
- University of Tartu Library
- Note:
- Pages:
- 113–123
- Language:
- URL:
- https://aclanthology.org/2023.nodalida-1.13
- DOI:
- Cite (ACL):
- Daniel Holmer and Evelina Rennes. 2023. Constructing Pseudo-parallel Swedish Sentence Corpora for Automatic Text Simplification. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 113–123, Tórshavn, Faroe Islands. University of Tartu Library.
- Cite (Informal):
- Constructing Pseudo-parallel Swedish Sentence Corpora for Automatic Text Simplification (Holmer & Rennes, NoDaLiDa 2023)
- PDF:
- https://preview.aclanthology.org/teach-a-man-to-fish/2023.nodalida-1.13.pdf