Robert Joshua Reynolds
Also published as: Robert Reynolds
2026
Transformer-based readability classifiers are worse than you think: Evidence from cross-domain Arabic readability assessment
Sarh Alzu’Bi | Robert Reynolds
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Sarh Alzu’Bi | Robert Reynolds
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Arabic readability assessment is under-explored compared to English, and existing models are typically evaluated only within the training domain. We introduce the Jordanian School Textbook Corpus (JSTC), 82,512 segments from 240 textbooks spanning grades 1–12, and combine it with DARES to train XGBoost classifiers, fine-tuned CAMeLBERT transformers, and hybrid architectures evaluated both in-domain and on the BAREC out-of-domain benchmark. CAMeLBERT achieves strong in-domain performance (QWK = 0.830) but its cross-domain QWK collapses to 0.085, while XGBoost over 127 handcrafted linguistic features alone maintains the highest cross-domain QWK (0.240); adding [CLS] embeddings to those features actively harms transfer. Probing reveals that CAMeLBERT layers implicitly capture some linguistic features but higher-level signals overwhelm them, and Captum attribution identifies nouns and nominal particles such as al- as the most important tokens. The results argue for prioritizing linguistically-grounded features over contextual embeddings when cross-domain robustness is required.
2025
UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment
Joseph Marvin Imperial | Abdullah Barayan | Regina Stodden | Rodrigo Wilkens | Ricardo Muñoz Sánchez | Lingyun Gao | Melissa Torgbi | Dawn Knight | Gail Forey | Reka R. Jablonkai | Ekaterina Kochmar | Robert Joshua Reynolds | Eugénio Ribeiro | Horacio Saggion | Elena Volodina | Sowmya Vajjala | Thomas François | Fernando Alva-Manchego | Harish Tayyar Madabushi
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Joseph Marvin Imperial | Abdullah Barayan | Regina Stodden | Rodrigo Wilkens | Ricardo Muñoz Sánchez | Lingyun Gao | Melissa Torgbi | Dawn Knight | Gail Forey | Reka R. Jablonkai | Ekaterina Kochmar | Robert Joshua Reynolds | Eugénio Ribeiro | Horacio Saggion | Elena Volodina | Sowmya Vajjala | Thomas François | Fernando Alva-Manchego | Harish Tayyar Madabushi
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
We introduce UniversalCEFR, a large-scale multilingual multidimensional dataset of texts annotated according to the CEFR (Common European Framework of Reference) scale in 13 languages. To enable open research in both automated readability and language proficiency assessment, UniversalCEFR comprises 505,807 CEFR-labeled texts curated from educational and learner-oriented resources, standardized into a unified data format to support consistent processing, analysis, and modeling across tasks and languages. To demonstrate its utility, we conduct benchmark experiments using three modelling paradigms: a) linguistic feature-based classification, b) fine-tuning pre-trained LLMs, and c) descriptor-based prompting of instruction-tuned LLMs. Our results further support using linguistic features and fine-tuning pretrained models in multilingual CEFR level assessment. Overall, UniversalCEFR aims to establish best practices in data distribution in language proficiency research by standardising dataset formats and promoting their accessibility to the global research community.
2016
Insights from Russian second language readability classification: complexity-dependent training requirements, and feature evaluation of multiple categories
Robert Reynolds
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications
Robert Reynolds
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications
2015
Automatic word stress annotation of Russian unrestricted text
Robert Reynolds | Francis Tyers
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)
Robert Reynolds | Francis Tyers
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)