Towards Creating a Bulgarian Readability Index
Dimitar Kazakov, Stefan Minkov, Ruslana Margova, Irina Temnikova, Ivo Emauilov
Abstract
Readability assessment plays a crucial role in education and text accessibility. While numerous indices exist for English and have been extended to Romance and Slavic languages, Bulgarian remains under- served in this regard. This paper reviews established readability metrics across these language families, examining their underlying features and modelling methods. We then report the first attempt to develop a readability index for Bulgarian, using end-of-school-year assessment questions and literary works targeted at children of various ages. Key linguistic attributes, namely, word length, sentence length, syllable count, and information content (based on word frequency), were extracted, and their first two statistical moments, mean and variance, were modelled against grade levels using linear and polynomial regression. Results suggest that polynomial models outperform linear ones by capturing non-linear relationships between textual features and perceived difficulty, but may be harder to interpret. This work provides an initial framework for building a reliable readability measure for Bulgarian, with applications in educational text design, adaptive learning, and corpus annotation.- Anthology ID:
- 2025.lowresnlp-1.18
- Volume:
- Proceedings of the First Workshop on Advancing NLP for Low-Resource Languages
- Month:
- September
- Year:
- 2025
- Address:
- Varna, Bulgaria
- Editors:
- Ernesto Luis Estevanell-Valladares, Alicia Picazo-Izquierdo, Tharindu Ranasinghe, Besik Mikaberidze, Simon Ostermann, Daniil Gurgurov, Philipp Mueller, Claudia Borg, Marián Šimko
- Venues:
- LowResNLP | WS
- SIG:
- Publisher:
- INCOMA Ltd., Shoumen, Bulgaria
- Note:
- Pages:
- 192–200
- Language:
- URL:
- https://preview.aclanthology.org/corrections-2026-01/2025.lowresnlp-1.18/
- DOI:
- Cite (ACL):
- Dimitar Kazakov, Stefan Minkov, Ruslana Margova, Irina Temnikova, and Ivo Emauilov. 2025. Towards Creating a Bulgarian Readability Index. In Proceedings of the First Workshop on Advancing NLP for Low-Resource Languages, pages 192–200, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
- Cite (Informal):
- Towards Creating a Bulgarian Readability Index (Kazakov et al., LowResNLP 2025)
- PDF:
- https://preview.aclanthology.org/corrections-2026-01/2025.lowresnlp-1.18.pdf