Modeling Chinese L2 Writing Development: The LLM-Surprisal Perspective

Jingying Hu; Yan Cong

Modeling Chinese L2 Writing Development: The LLM-Surprisal Perspective

Abstract

LLM-surprisal is a computational measure of how unexpected a word or character is given the preceding context, as estimated by large language models (LLMs). This study investigated the effectiveness of LLM-surprisal in modeling second language (L2) writing development, focusing on Chinese L2 writing as a case to test its cross-linguistical generalizability. We selected three types of LLMs with different pretraining settings: a multilingual model trained on various languages, a Chinese-general model trained on both Simplified and Traditional Chinese, and a Traditional-Chinese-specific model. This comparison allowed us to explore how model architecture and training data affect LLM-surprisal estimates of learners’ essays written in Traditional Chinese, which in turn influence the modeling of L2 proficiency and development. We also correlated LLM-surprisals with 16 classic linguistic complexity indices (e.g., character sophistication, lexical diversity, syntactic complexity, and discourse coherence) to evaluate its interpretability and validity as a measure of L2 writing assessment. Our findings demonstrate the potential of LLM-surprisal as a robust, interpretable, cross-linguistically applicable metric for automatic writing assessment and contribute to bridging computational and linguistic approaches in understanding and modeling L2 writing development. All analysis scripts are available at https://github.com/JingyingHu/ChineseL2Writing-Surprisals.

Anthology ID:: 2025.cmcl-1.22
Volume:: Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Month:: May
Year:: 2025
Address:: Albuquerque, New Mexico, USA
Editors:: Tatsuki Kuribayashi, Giulia Rambelli, Ece Takmaz, Philipp Wicke, Jixing Li, Byung-Doh Oh
Venues:: CMCL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 172–183
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.cmcl-1.22/
DOI:
Bibkey:
Cite (ACL):: Jingying Hu and Yan Cong. 2025. Modeling Chinese L2 Writing Development: The LLM-Surprisal Perspective. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, pages 172–183, Albuquerque, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):: Modeling Chinese L2 Writing Development: The LLM-Surprisal Perspective (Hu & Cong, CMCL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.cmcl-1.22.pdf

PDF Cite Search Fix data