Abdullah Barayan


2025

pdf bib
Analysing Zero-Shot Readability-Controlled Sentence Simplification
Abdullah Barayan | Jose Camacho-Collados | Fernando Alva-Manchego
Proceedings of the 31st International Conference on Computational Linguistics

Readability-controlled text simplification (RCTS) rewrites texts to lower readability levels while preserving their meaning. RCTS models often depend on parallel corpora with readability annotations on both source and target sides. Such datasets are scarce and difficult to curate, especially at the sentence level. To reduce reliance on parallel data, we explore using instruction-tuned large language models for zero-shot RCTS. Through automatic and manual evaluations, we examine: (1) how different types of contextual information affect a model’s ability to generate sentences with the desired readability, and (2) the trade-off between achieving target readability and preserving meaning. Results show that all tested models struggle to simplify sentences (especially to the lowest levels) due to models’ limitations and characteristics of the source sentences that impede adequate rewriting. Our experiments also highlight the need for better automatic evaluation metrics tailored to RCTS, as standard ones often misinterpret common simplification operations, and inaccurately assess readability and meaning preservation.

pdf bib
UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment
Joseph Marvin Imperial | Abdullah Barayan | Regina Stodden | Rodrigo Wilkens | Ricardo Muñoz Sánchez | Lingyun Gao | Melissa Torgbi | Dawn Knight | Gail Forey | Reka R. Jablonkai | Ekaterina Kochmar | Robert Joshua Reynolds | Eugénio Ribeiro | Horacio Saggion | Elena Volodina | Sowmya Vajjala | Thomas François | Fernando Alva-Manchego | Harish Tayyar Madabushi
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

We introduce UniversalCEFR, a large-scale multilingual multidimensional dataset of texts annotated according to the CEFR (Common European Framework of Reference) scale in 13 languages. To enable open research in both automated readability and language proficiency assessment, UniversalCEFR comprises 505,807 CEFR-labeled texts curated from educational and learner-oriented resources, standardized into a unified data format to support consistent processing, analysis, and modeling across tasks and languages. To demonstrate its utility, we conduct benchmark experiments using three modelling paradigms: a) linguistic feature-based classification, b) fine-tuning pre-trained LLMs, and c) descriptor-based prompting of instruction-tuned LLMs. Our results further support using linguistic features and fine-tuning pretrained models in multilingual CEFR level assessment. Overall, UniversalCEFR aims to establish best practices in data distribution in language proficiency research by standardising dataset formats and promoting their accessibility to the global research community.

pdf bib
Findings of the TSAR 2025 Shared Task on Readability-Controlled Text Simplification
Fernando Alva-Manchego | Regina Stodden | Joseph Marvin Imperial | Abdullah Barayan | Kai North | Harish Tayyar Madabushi
Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)

This paper presents the findings of the first Shared Task on Readability-Controlled Text Simplification at TSAR 2025. The task required systems to simplify English texts to specific target readability levels of the Common European Framework of Reference for Languages (CEFR). We received 48 submissions from 20 participating teams, with approaches predominantly based on large language models (LLMs), which included iterative refinement, multi-agent setups, and LLM-as-a-judge pipelines. For this shared task, we developed a new dataset of pedagogical texts and evaluated submissions using a weighted combination of semantic similarity and CEFR-level accuracy. The results of the participating teams demonstrate that while LLMs can perform substantially well on this task, dependable and controlled simplification often requires complex, multi-iterative processes. Our findings also suggest that the capabilities of current systems are beginning to saturate existing automatic evaluation metrics, underscoring the need for reevaluation and practicality.