Carsten Roever


2025

pdf bib
Interaction Matters: An Evaluation Framework for Interactive Dialogue Assessment on English Second Language Conversations
Rena Gao | Carsten Roever | Jey Han Lau
Proceedings of the 31st International Conference on Computational Linguistics

We present an evaluation framework for interactive dialogue assessment in the context of English as a Second Language (ESL) speakers. Our framework collects dialogue-level interactivity labels (e.g., topic management; 4 labels in total) and micro-level span features (e.g., backchannels; 17 features in total). Given our annotated data, we study how the micro-level features influence the (higher level) interactivity quality of ESL dialogues by constructing various machine learning-based models. Our results demonstrate that certain micro-level features strongly correlate with interactivity quality, like reference words (e.g., she, her, he), revealing new insights about the interaction between higher-level dialogue quality and lower-level fundamental linguistic signals. Our framework also provides a means to assess ESL communication, which is useful for language assessment.

pdf bib
An Interpretable and Crosslingual Method for Evaluating Second-Language Dialogues
Rena Gao | Jingxuan Wu | Xuetong Wu | Carsten Roever | Jing Wu | Long Lv | Jey Han Lau
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

We analyse the cross-lingual transferability of a dialogue evaluation framework that assesses the relationships between micro-level linguistic features (e.g. backchannels) and macro-level interactivity labels (e.g. topic management), originally designed for English-as-a-second-language dialogues. To this end, we develop CNIMA (**C**hinese **N**on-Native **I**nteractivity **M**easurement and **A**utomation), a Chinese-as-a-second-language labelled dataset with 10K dialogues. We found the evaluation framework to be robust across languages, revealing language-specific and language-universal relationships between micro-level and macro-level features. Next, we propose an automated, interpretable approach with low data requirements that scores the overall quality of a second-language dialogue based on the framework. Our approach is interpretable in that it reveals the key linguistic and interactivity features that contributed to the overall quality score. As our approach does not require labelled data, it can also be adapted to other languages for second-language dialogue evaluation.