Simona Kuoraitė


2025

pdf bib
Simplifying Lithuanian text into Easy-to-Read language using large language models
Simona Kuoraitė | Valentas Gružauskas
Proceedings of the 1st Workshop on Artificial Intelligence and Easy and Plain Language in Institutional Contexts (AI & EL/PL)

This paper explores the task of simplifying Lithuanian text into Easy-to-Read language. Easy-to-Read language is a text written in short, clear sentences and simple words, adapted for people with intellectual disabilities or limited language skills. The aim of this work is to investigate how the large language model Lt-Llama-2-7b-hf, pre-trained on Lithuanian language data, can be adapted to the task of simplifying Lithuanian texts into Easy-to-Read language. To achieve this goal, specialized datasets were developed to fine-tune the model, and experiments were carried out. The model was tested by presenting the texts in their original language and the texts with a prompt adapted to the task. The results were evaluated using the SARI metric for assessing the quality of simplified texts and a qualitative evaluation of the large language model. The results show that the fine-tuned model sometimes simplifies text better than a not fine-tuned model, but that a larger and more extensive dataset would be needed to achieve significant results, and that more research should be carried out on fine-tuning the model for this task.