RALS: Resources and Baselines for Romanian Automatic Lexical Simplification
Fabian Anghel, Cristea Petru-Theodor, Claudiu Creanga, Sergiu Nisioi
Abstract
We introduce the first dataset that jointly covers both lexical complexity prediction (LCP) annotations and lexical simplification (LS) for Romanian, along with a comparison of lexical simplification approaches. We propose a methodology for ordering simplification suggestions using a pairwise ranking approximation method, arranging candidates from simple to complex based on a separate set of human judgments. In addition, we provide human lexical complexity annotations for 3,921 word samples in context. Finally, we explore several novel pipelines for complexity prediction and simplification and present the first text simplification system for Romanian.- Anthology ID:
- 2025.emnlp-main.1603
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 31469–31480
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1603/
- DOI:
- Cite (ACL):
- Fabian Anghel, Cristea Petru-Theodor, Claudiu Creanga, and Sergiu Nisioi. 2025. RALS: Resources and Baselines for Romanian Automatic Lexical Simplification. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 31469–31480, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- RALS: Resources and Baselines for Romanian Automatic Lexical Simplification (Anghel et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1603.pdf