Xiaobin Chen
2026
TeamXBC at BEA 2026 Shared Task 1: How AI (and I) won the shared task: Vibe and agentic coding solutions for practical machine learning problems
Xiaobin Chen
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Xiaobin Chen
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
The paper describes how the author used AI coding agents and a technique called vibe coding to successfully tackle the BEA 2026 shared task on vocabulary difficulty prediction. Three sets of predictions (runs) were submitted to the competition, corresponding to three experiments the author ran by giving the coding agent different levels of agency: (1) a one-off solution fully planned and implemented by the AI, (2) an AI self-determined iterative process that ran for 24 hours, and (3) a collaborative human-in-the-loop process where solutions were discussed between the author and the AI. Competition results showed that the collaborative mode delivered the best performance, demonstrating that at the current stage domain expert input and decision making are important and necessary for vibe coding solutions to practical machine learning problems.
From Dialogue to Learner Modeling: Identifying Candidate Signals of Productive Use in LLM-Based Grammar Practice
Luisa Ribeiro-Flucht | Lanhua Huang | Xiaobin Chen
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Luisa Ribeiro-Flucht | Lanhua Huang | Xiaobin Chen
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Adaptive language-learning systems often model progress through correctness in constrained exercises, where the target response is predefined. In dialogue-based tutors, by contrast, learners can respond appropriately in many ways, making evidence of progress harder to interpret. This raises a learner-modeling problem: determining whether learner production provides useful evidence of progress, which aspects are informative, and how they might support adaptation. We address this problem using pilot data from an LLM-based English grammar tutor, comprising 40 pre- and post-test tasks, treatment interactions, and 2,406 learner messages. We propose a coding scheme for learner production in dialogue and explore whether the resulting evidence types can support future adaptive decisions. Findings show that learner production in dialogue can support adaptive grammar practice: prior target use predicted short-term performance, while finer-grained evidence helped distinguish different levels of productive control. We discuss implications for adaptive grammar-based dialogue systems that use learner production to support communicative practice.
Using LLMs for item creation: Validating the potential of automatically generated sentence repetition test items for language assessment
Sarah Löber | Björn Rudzewitz | Yuan Chu | Mengyuan He | Shiqin Liu | Yushan Ye | Xiaobin Chen
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Sarah Löber | Björn Rudzewitz | Yuan Chu | Mengyuan He | Shiqin Liu | Yushan Ye | Xiaobin Chen
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Various aspects of the Elicited Imitation Test (EIT), a sentence repetition task for language assessment, can be automated, for example in terms of test administration or automatic scoring. It is potentially also possible to generate test items with Large Language Models (LLMs). This study investigates the potential of GPT-4o for item creation in the context of EIT, creating a parallel form to two popular and validated tests. We analysed the tests in terms of their linguistic and psychometric properties. While the items created by the LLM show some difference in grammatical structures when compared to human-written items, linguistic complexity results did not differ significantly between tests. Psychometric properties showed only minor differences. These findings lend support to the potential of Automatic Item Generation with LLMs in the context of sentence repetition tasks and might support the process of standardisation in SLA research and testing by enabling parallel test creation.
2025
A Framework for Proficiency-Aligned Grammar Practice in LLM-Based Dialogue Systems
Luisa Ribeiro-Flucht | Xiaobin Chen | Detmar Meurers
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Luisa Ribeiro-Flucht | Xiaobin Chen | Detmar Meurers
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Communicative practice is critical for second language development, yet learners often lack targeted, engaging opportunities to use new grammar structures. While large language models (LLMs) can offer coherent interactions, they are not inherently aligned with pedagogical goals or proficiency levels. In this paper, we explore how LLMs can be integrated into a structured framework for contextually-constrained, grammar-focused interaction, building on an existing goal-oriented dialogue system. Through controlled simulations, we evaluate five LLMs across 75 A2-level tasks under two conditions: (i) grammar-targeted, task-anchored prompting and (ii) the addition of a lightweight post-generation validation pipeline using a grammar annotator.Our findings show that template-based prompting alone substantially increases target-form coverage up to 91.4% for LLaMA 3.1-70B-Instruct, while reducing overly advanced grammar usage. The validation pipeline provides an additional boost in form-focused tasks, raising coverage to 96.3% without significantly degrading appropriateness.
2024
Developing a Pedagogically Oriented Interactive Reading Tool with Teachers in the Loops
Mihwa Lee | Björn Rudzewitz | Xiaobin Chen
Proceedings of the 13th Workshop on Natural Language Processing for Computer Assisted Language Learning
Mihwa Lee | Björn Rudzewitz | Xiaobin Chen
Proceedings of the 13th Workshop on Natural Language Processing for Computer Assisted Language Learning
Developing a Web-Based Intelligent Language Assessment Platform Powered by Natural Language Processing Technologies
Sarah Löber | Björn Rudzewitz | Daniela Verratti Souto | Luisa Ribeiro-Flucht | Xiaobin Chen
Proceedings of the 13th Workshop on Natural Language Processing for Computer Assisted Language Learning
Sarah Löber | Björn Rudzewitz | Daniela Verratti Souto | Luisa Ribeiro-Flucht | Xiaobin Chen
Proceedings of the 13th Workshop on Natural Language Processing for Computer Assisted Language Learning
Explainable AI in Language Learning: Linking Empirical Evidence and Theoretical Concepts in Proficiency and Readability Modeling of Portuguese
Luisa Ribeiro-Flucht | Xiaobin Chen | Detmar Meurers
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)
Luisa Ribeiro-Flucht | Xiaobin Chen | Detmar Meurers
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)
While machine learning methods have supported significantly improved results in education research, a common deficiency lies in the explainability of the result. Explainable AI (XAI) aims to fill that gap by providing transparent, conceptually understandable explanations for the classification decisions, enhancing human comprehension and trust in the outcomes. This paper explores an XAI approach to proficiency and readability assessment employing a comprehensive set of 465 linguistic complexity measures. We identify theoretical descriptions associating such measures with varying levels of proficiency and readability and validate them using cross-corpus experiments employing supervised machine learning and Shapley Additive Explanations. The results not only highlight the utility of a diverse set of complexity measures in effectively modeling proficiency and readability in Portuguese, achieving a state-of-the-art accuracy of 0.70 in the proficiency classification task and of 0.84 in the readability classification task, but they largely corroborate the theoretical research assumptions, especially in the lexical domain.
2022
CTAP for Chinese:A Linguistic Complexity Feature Automatic Calculation Platform
Yue Cui | Junhui Zhu | Liner Yang | Xuezhi Fang | Xiaobin Chen | Yujie Wang | Erhong Yang
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Yue Cui | Junhui Zhu | Liner Yang | Xuezhi Fang | Xiaobin Chen | Yujie Wang | Erhong Yang
Proceedings of the Thirteenth Language Resources and Evaluation Conference
The construct of linguistic complexity has been widely used in language learning research. Several text analysis tools have been created to automatically analyze linguistic complexity. However, the indexes supported by several existing Chinese text analysis tools are limited and different because of different research purposes. CTAP is an open-source linguistic complexity measurement extraction tool, which prompts any research purposes. Although it was originally developed for English, the Unstructured Information Management (UIMA) framework it used allows the integration of other languages. In this study, we integrated the Chinese component into CTAP, describing the index sets it incorporated and comparing it with three linguistic complexity tools for Chinese. The index set includes four levels of 196 linguistic complexity indexes: character level, word level, sentence level, and discourse level. So far, CTAP has implemented automatic calculation of complexity characteristics for four languages, aiming to help linguists without NLP background study language complexity.
2021
Using Broad Linguistic Complexity Modeling for Cross-Lingual Readability Assessment
Zarah Weiss | Xiaobin Chen | Detmar Meurers
Proceedings of the 10th Workshop on NLP for Computer Assisted Language Learning
Zarah Weiss | Xiaobin Chen | Detmar Meurers
Proceedings of the 10th Workshop on NLP for Computer Assisted Language Learning
2017
Challenging learners in their individual zone of proximal development using pedagogic developmental benchmarks of syntactic complexity
Xiaobin Chen | Detmar Meurers
Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition
Xiaobin Chen | Detmar Meurers
Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition
2016
CTAP: A Web-Based Tool Supporting Automatic Complexity Analysis
Xiaobin Chen | Detmar Meurers
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)
Xiaobin Chen | Detmar Meurers
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)
Informed by research on readability and language acquisition, computational linguists have developed sophisticated tools for the analysis of linguistic complexity. While some tools are starting to become accessible on the web, there still is a disconnect between the features that can in principle be identified based on state-of-the-art computational linguistic analysis, and the analyses a second language acquisition researcher, teacher, or textbook writer can readily obtain and visualize for their own collection of texts. This short paper presents a web-based tool development that aims to meet this challenge. The Common Text Analysis Platform (CTAP) is designed to support fully configurable linguistic feature extraction for a wide range of complexity analyses. It features a user-friendly interface, modularized and reusable analysis component integration, and flexible corpus and feature management. Building on the Unstructured Information Management framework (UIMA), CTAP readily supports integration of state-of-the-art NLP and complexity feature extraction maintaining modularization and reusability. CTAP thereby aims at providing a common platform for complexity analysis, encouraging research collaboration and sharing of feature extraction components—to jointly advance the state-of-the-art in complexity analysis in a form that readily supports real-life use by ordinary users.