Quantifying word complexity for Leichte Sprache: A computational metric and its psycholinguistic validation

Umesh Patil, Jesus Calvillo, Sol Lago, Anne-Kathrin Schumann


Abstract
Leichte Sprache (Easy Language or Easy German) is a strongly simplified version of German geared toward a target group with limited language proficiency. In Germany, public bodies are required to provide information in Leichte Sprache. Unfortunately, Leichte Sprache rules are traditionally defined by non-linguists, they are not rooted in linguistic research and they do not provide precise decision criteria or devices for measuring the complexity of linguistic structures (Bock and Pappert,2023). For instance, one of the rules simply recommends the usage of simple rather than complex words. In this paper we, therefore, propose a model to determine word complexity. We train an XGBoost model for classifying word complexity by leveraging word-level linguistic and corpus-level distributional features, frequency information from an in-house Leichte Sprache corpus and human complexity annotations. We psycholinguistically validate our model by showing that it captures human word recognition times above and beyond traditional word-level predictors. Moreover, we discuss a number of practical applications of our classifier, such as the evaluation of AI-simplified text and detection of CEFR levels of words. To our knowledge, this is one of the first attempts to systematically quantify word complexity in the context of Leichte Sprache and to link it directly to real-time word processing.
Anthology ID:
2025.aielpl-1.9
Volume:
Proceedings of the 1st Workshop on Artificial Intelligence and Easy and Plain Language in Institutional Contexts (AI & EL/PL)
Month:
June
Year:
2025
Address:
Geneva, Switzerland
Editors:
María Isabel Rivas Ginel, Patrick Cadwell, Paolo Canavese, Silvia Hansen-Schirra, Martin Kappus, Anna Matamala, Will Noonan
Venue:
AIELPL
SIG:
Publisher:
European Association for Machine Translation
Note:
Pages:
94–107
Language:
URL:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.aielpl-1.9/
DOI:
Bibkey:
Cite (ACL):
Umesh Patil, Jesus Calvillo, Sol Lago, and Anne-Kathrin Schumann. 2025. Quantifying word complexity for Leichte Sprache: A computational metric and its psycholinguistic validation. In Proceedings of the 1st Workshop on Artificial Intelligence and Easy and Plain Language in Institutional Contexts (AI & EL/PL), pages 94–107, Geneva, Switzerland. European Association for Machine Translation.
Cite (Informal):
Quantifying word complexity for Leichte Sprache: A computational metric and its psycholinguistic validation (Patil et al., AIELPL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.aielpl-1.9.pdf