Towards Creating a Bulgarian Readability Index

Dimitar Kazakov, Stefan Minkov, Ruslana Margova, Irina Temnikova, Ivo Emauilov


Abstract
Readability assessment plays a crucial role in education and text accessibility. While numerous indices exist for English and have been extended to Romance and Slavic languages, Bulgarian remains under- served in this regard. This paper reviews established readability metrics across these language families, examining their underlying features and modelling methods. We then report the first attempt to develop a readability index for Bulgarian, using end-of-school-year assessment questions and literary works targeted at children of various ages. Key linguistic attributes, namely, word length, sentence length, syllable count, and information content (based on word frequency), were extracted, and their first two statistical moments, mean and variance, were modelled against grade levels using linear and polynomial regression. Results suggest that polynomial models outperform linear ones by capturing non-linear relationships between textual features and perceived difficulty, but may be harder to interpret. This work provides an initial framework for building a reliable readability measure for Bulgarian, with applications in educational text design, adaptive learning, and corpus annotation.
Anthology ID:
2025.lowresnlp-1.18
Volume:
Proceedings of the First Workshop on Advancing NLP for Low-Resource Languages
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Ernesto Luis Estevanell-Valladares, Alicia Picazo-Izquierdo, Tharindu Ranasinghe, Besik Mikaberidze, Simon Ostermann, Daniil Gurgurov, Philipp Mueller, Claudia Borg, Marián Šimko
Venues:
LowResNLP | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
192–200
Language:
URL:
https://preview.aclanthology.org/corrections-2026-01/2025.lowresnlp-1.18/
DOI:
Bibkey:
Cite (ACL):
Dimitar Kazakov, Stefan Minkov, Ruslana Margova, Irina Temnikova, and Ivo Emauilov. 2025. Towards Creating a Bulgarian Readability Index. In Proceedings of the First Workshop on Advancing NLP for Low-Resource Languages, pages 192–200, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Towards Creating a Bulgarian Readability Index (Kazakov et al., LowResNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2026-01/2025.lowresnlp-1.18.pdf