Cristea Petru-Theodor


2025

pdf bib
RALS: Resources and Baselines for Romanian Automatic Lexical Simplification
Fabian Anghel | Cristea Petru-Theodor | Claudiu Creanga | Sergiu Nisioi
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

We introduce the first dataset that jointly covers both lexical complexity prediction (LCP) annotations and lexical simplification (LS) for Romanian, along with a comparison of lexical simplification approaches. We propose a methodology for ordering simplification suggestions using a pairwise ranking approximation method, arranging candidates from simple to complex based on a separate set of human judgments. In addition, we provide human lexical complexity annotations for 3,921 word samples in context. Finally, we explore several novel pipelines for complexity prediction and simplification and present the first text simplification system for Romanian.