RALS: Resources and Baselines for Romanian Automatic Lexical Simplification

Fabian Anghel, Cristea Petru-Theodor, Claudiu Creanga, Sergiu Nisioi


Abstract
We introduce the first dataset that jointly covers both lexical complexity prediction (LCP) annotations and lexical simplification (LS) for Romanian, along with a comparison of lexical simplification approaches. We propose a methodology for ordering simplification suggestions using a pairwise ranking approximation method, arranging candidates from simple to complex based on a separate set of human judgments. In addition, we provide human lexical complexity annotations for 3,921 word samples in context. Finally, we explore several novel pipelines for complexity prediction and simplification and present the first text simplification system for Romanian.
Anthology ID:
2025.emnlp-main.1603
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
31469–31480
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1603/
DOI:
Bibkey:
Cite (ACL):
Fabian Anghel, Cristea Petru-Theodor, Claudiu Creanga, and Sergiu Nisioi. 2025. RALS: Resources and Baselines for Romanian Automatic Lexical Simplification. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 31469–31480, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
RALS: Resources and Baselines for Romanian Automatic Lexical Simplification (Anghel et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1603.pdf
Checklist:
 2025.emnlp-main.1603.checklist.pdf