Modeling Chinese L2 Writing Development: The LLM-Surprisal Perspective

Jingying Hu, Yan Cong


Abstract
LLM-surprisal is a computational measure of how unexpected a word or character is given the preceding context, as estimated by large language models (LLMs). This study investigated the effectiveness of LLM-surprisal in modeling second language (L2) writing development, focusing on Chinese L2 writing as a case to test its cross-linguistical generalizability. We selected three types of LLMs with different pretraining settings: a multilingual model trained on various languages, a Chinese-general model trained on both Simplified and Traditional Chinese, and a Traditional-Chinese-specific model. This comparison allowed us to explore how model architecture and training data affect LLM-surprisal estimates of learners’ essays written in Traditional Chinese, which in turn influence the modeling of L2 proficiency and development. We also correlated LLM-surprisals with 16 classic linguistic complexity indices (e.g., character sophistication, lexical diversity, syntactic complexity, and discourse coherence) to evaluate its interpretability and validity as a measure of L2 writing assessment. Our findings demonstrate the potential of LLM-surprisal as a robust, interpretable, cross-linguistically applicable metric for automatic writing assessment and contribute to bridging computational and linguistic approaches in understanding and modeling L2 writing development. All analysis scripts are available at https://github.com/JingyingHu/ChineseL2Writing-Surprisals.
Anthology ID:
2025.cmcl-1.22
Volume:
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico, USA
Editors:
Tatsuki Kuribayashi, Giulia Rambelli, Ece Takmaz, Philipp Wicke, Jixing Li, Byung-Doh Oh
Venues:
CMCL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
172–183
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.cmcl-1.22/
DOI:
Bibkey:
Cite (ACL):
Jingying Hu and Yan Cong. 2025. Modeling Chinese L2 Writing Development: The LLM-Surprisal Perspective. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, pages 172–183, Albuquerque, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Modeling Chinese L2 Writing Development: The LLM-Surprisal Perspective (Hu & Cong, CMCL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.cmcl-1.22.pdf