Reproduction of German Text Simplification Systems

Regina Stodden


Abstract
The paper investigates the reproducibility of various approaches to automatically simplify German texts and identifies key challenges in the process. We reproduce eight sentence simplification systems including rules-based models, fine-tuned models, and prompting of autoregressive models. We highlight three main issues of reproducibility: the impossibility of reproduction due to missing details, code, or restricted access to data/models; variations in reproduction, hindering meaningful comparisons; and discrepancies in evaluation scores between reported and reproduced models. To enhance reproducibility and facilitate model comparison, we recommend the publication of model-related details, including checkpoints, code, and training methodologies. Our study also emphasizes the importance of releasing system generations, when possible, for thorough analysis and better understanding of original works. In our effort to compare reproduced models, we also create a German sentence simplification benchmark of the eight models across six test sets. Overall, the study underscores the significance of transparency, documentation, and diverse training data for advancing reproducibility and meaningful model comparison in automated German text simplification.
Anthology ID:
2024.determit-1.1
Volume:
Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Giorgio Maria Di Nunzio, Federica Vezzani, Liana Ermakova, Hosein Azarbonyad, Jaap Kamps
Venues:
DeTermIt | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
1–15
Language:
URL:
https://aclanthology.org/2024.determit-1.1
DOI:
Bibkey:
Cite (ACL):
Regina Stodden. 2024. Reproduction of German Text Simplification Systems. In Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024, pages 1–15, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Reproduction of German Text Simplification Systems (Stodden, DeTermIt-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2024.determit-1.1.pdf