2025
pdf
bib
abs
Can LLMs Simulate L2-English Dialogue? An Information-Theoretic Analysis of L1-Dependent Biases
Rena Gao
|
Xuetong Wu
|
Tatsuki Kuribayashi
|
Mingrui Ye
|
Siya Qi
|
Carsten Roever
|
Yuanxing Liu
|
Zheng Yuan
|
Jey Han Lau
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This study evaluates Large Language Models’ (LLMs) ability to simulate non-native-like English use observed in human second language (L2) learners interfered with by their native first language (L1). In dialogue-based interviews, we prompt LLMs to mimic L2 English learners with specific L1s (e.g., Japanese, Thai, Urdu) across seven languages, comparing their outputs to real L2 learner data. Our analysis examines L1-driven linguistic biases, such as reference word usage and avoidance behaviors, using information-theoretic and distributional density measures. Results show that modern LLMs (e.g., Qwen2.5, LLAMA3, DeepseekV3, GPT 4o) replicate L1-dependent patterns observed in human L2 data, with distinct influences from various languages (e.g., Japanese, Korean, and Mandarin significantly affect tense agreement, and Urdu influences noun-verb collocations). Our results reveal LLMs’ potential for L2 dialogue generation and evaluation for future educational applications.
pdf
bib
abs
From Posts to Timelines: Modeling Mental Health Dynamics from Social Media Timelines with Hybrid LLMs
Zimu Wang
|
Hongbin Na
|
Rena Gao
|
Jiayuan Ma
|
Yining Hua
|
Ling Chen
|
Wei Wang
Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2025)
Social media data is recognized for its usefulness in the early detection of mental disorders; however, there is a lack of research focused on modeling individuals’ longitudinal mental health dynamics. Moreover, fine-tuning large language models (LLMs) on large-scale, annotated datasets presents challenges due to privacy concerns and the difficulties on data collection and annotation. In this paper, we propose a novel approach for modeling mental health dynamics using hybrid LLMs, where we first apply both classification-based and generation-based models to identify adaptive and maladaptive evidence from individual posts. This evidence is then used to predict well-being scores and generate post-level and timeline-level summaries. Experimental results on the CLPsych 2025 shared task demonstrate the effectiveness of our method, with the generative-based model showing a marked advantage in evidence identification.
pdf
bib
abs
Interaction Matters: An Evaluation Framework for Interactive Dialogue Assessment on English Second Language Conversations
Rena Gao
|
Carsten Roever
|
Jey Han Lau
Proceedings of the 31st International Conference on Computational Linguistics
We present an evaluation framework for interactive dialogue assessment in the context of English as a Second Language (ESL) speakers. Our framework collects dialogue-level interactivity labels (e.g., topic management; 4 labels in total) and micro-level span features (e.g., backchannels; 17 features in total). Given our annotated data, we study how the micro-level features influence the (higher level) interactivity quality of ESL dialogues by constructing various machine learning-based models. Our results demonstrate that certain micro-level features strongly correlate with interactivity quality, like reference words (e.g., she, her, he), revealing new insights about the interaction between higher-level dialogue quality and lower-level fundamental linguistic signals. Our framework also provides a means to assess ESL communication, which is useful for language assessment.
pdf
bib
abs
Moderation Matters: Measuring Conversational Moderation Impact in English as a Second Language Group Discussion
Rena Gao
|
Ming-Bin Chen
|
Lea Frermann
|
Jey Han Lau
Findings of the Association for Computational Linguistics: ACL 2025
English as a Second Language (ESL) speakers often struggle to engage in group discussions due to language barriers. While moderators can facilitate participation, few studies assess conversational engagement and evaluate moderation effectiveness. To address this gap, we develop a dataset comprising 17 sessions from an online ESL conversation club, which includes both moderated and non-moderated discussions. We then introduce an approach that integrates automatic ESL dialogue assessment and a framework that categorizes moderation strategies. Our findings indicate that moderators help improve the flow of topics and start/end a conversation. Interestingly, we find active acknowledgement and encouragement to be the most effective moderation strategy, while excessive information and opinion sharing by moderators has a negative impact. Ultimately, our study paves the way for analyzing ESL group discussions and the role of moderators in non-native conversation settings. Code and data are available at
https://github.com/RenaGao/L2Moderator.
pdf
bib
abs
An Interpretable and Crosslingual Method for Evaluating Second-Language Dialogues
Rena Gao
|
Jingxuan Wu
|
Xuetong Wu
|
Carsten Roever
|
Jing Wu
|
Long Lv
|
Jey Han Lau
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
We analyse the cross-lingual transferability of a dialogue evaluation framework that assesses the relationships between micro-level linguistic features (e.g. backchannels) and macro-level interactivity labels (e.g. topic management), originally designed for English-as-a-second-language dialogues. To this end, we develop CNIMA (**C**hinese **N**on-Native **I**nteractivity **M**easurement and **A**utomation), a Chinese-as-a-second-language labelled dataset with 10K dialogues. We found the evaluation framework to be robust across languages, revealing language-specific and language-universal relationships between micro-level and macro-level features. Next, we propose an automated, interpretable approach with low data requirements that scores the overall quality of a second-language dialogue based on the framework. Our approach is interpretable in that it reveals the key linguistic and interactivity features that contributed to the overall quality score. As our approach does not require labelled data, it can also be adapted to other languages for second-language dialogue evaluation.
2024
pdf
bib
Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Michael Hahn
|
Alexey Sorokin
|
Ritesh Kumar
|
Andreas Shcherbakov
|
Yulia Otmakhova
|
Jinrui Yang
|
Oleg Serikov
|
Priya Rani
|
Edoardo M. Ponti
|
Saliha Muradoğlu
|
Rena Gao
|
Ryan Cotterell
|
Ekaterina Vylomova
Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP