Catarina Belem
2025
Readability Reconsidered A Cross-Dataset Analysis of Reference-Free Metrics
Catarina Belem
|
Parker Glenn
|
Alfy Samuel
|
Anoop Kumar
|
Daben Liu
Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)
Automatic readability assessment plays a key role in ensuring effective communication between humans and language models. Despite significant progress the field is hindered by inconsistent definitions of readability and measurements that rely on surface-level text properties. In this work we investigate the factors shaping human perceptions of readability through the analysis of 1.2k judgments finding that beyond surface-level cues information content and topic strongly shape text comprehensibility. Furthermore we evaluate 15 popular readability metrics across 5 datasets contrasting them with 5 more nuanced model-based metrics. Our results show that four model-based metrics consistently place among the top 4 in rank correlations with human judgments while the best performing traditional metric achieves an average rank of 7.8. These findings highlight a mismatch between current readability metrics and human perceptions pointing to model-based approaches as a more promising direction.