Yu-Chien Tang
2026
ConceptKT: A Benchmark for Concept-Level Deficiency Prediction in Knowledge Tracing
Yu-Chen Kang | Yu-Chien Tang | An-Zi Yen
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Yu-Chen Kang | Yu-Chien Tang | An-Zi Yen
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Knowledge Tracing (KT) is a critical technique for modeling student knowledge to support personalized learning. However, most KT systems focus on binary correctness prediction and cannot diagnose the underlying conceptual misunderstandings that lead to errors. Such fine-grained diagnostic feedback is essential for designing targeted instruction and effective remediation. In this work, we introduce the task of concept-level deficiency prediction, which extends traditional KT by identifying the specific concepts a student is likely to struggle with on future problems. We present ConceptKT, a dataset annotated with labels that capture both the concepts required to solve each question and the missing concepts underlying incorrect responses. We investigate in-context learning approaches to KT and evaluate the diagnostic capabilities of various Large Language Models (LLMs) and Large Reasoning Models (LRMs). Different strategies for selecting informative historical records are explored. Experimental results demonstrate that selecting response histories based on conceptual alignment and semantic similarity leads to improved performance on both correctness prediction and concept-level deficiency identification.
MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning Support
Wei-Ling Hsu | Yu-Chien Tang | An-Zi Yen
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Wei-Ling Hsu | Yu-Chien Tang | An-Zi Yen
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
The increasing reliance on Large Language Models (LLMs) across various domains extends to education, where students progressively use generative AI as a tool for learning. While prior work has examined LLMs’ mathematical ability, their reliability in grading authentic student problem-solving processes and delivering effective feedback remains underexplored. This study introduces MathEDU, a dataset consisting of student problem-solving processes in mathematics and corresponding teacher-written feedback. We systematically evaluate the reliability of various models across three hierarchical tasks: answer correctness classification, error identification, and feedback generation. Experimental results show that fine-tuning strategies effectively improve performance in classifying correctness and locating erroneous steps. However, the generated feedback across models shows a considerable gap from teacher-written feedback. Critically, the generated feedback is often verbose and fails to provide targeted explanations for the student’s underlying misconceptions. This emphasizes the urgent need for trustworthy and pedagogy-aware AI feedback in education.
2023
RSVP: Customer Intent Detection via Agent Response Contrastive and Generative Pre-Training
Yu-Chien Tang | Wei-Yao Wang | An-Zi Yen | Wen-Chih Peng
Findings of the Association for Computational Linguistics: EMNLP 2023
Yu-Chien Tang | Wei-Yao Wang | An-Zi Yen | Wen-Chih Peng
Findings of the Association for Computational Linguistics: EMNLP 2023
The dialogue systems in customer services have been developed with neural models to provide users with precise answers and round-the-clock support in task-oriented conversations by detecting customer intents based on their utterances. Existing intent detection approaches have highly relied on adaptively pre-training language models with large-scale datasets, yet the predominant cost of data collection may hinder their superiority. In addition, they neglect the information within the conversational responses of the agents, which have a lower collection cost, but are significant to customer intent as agents must tailor their replies based on the customers’ intent. In this paper, we propose RSVP, a self-supervised framework dedicated to task-oriented dialogues, which utilizes agent responses for pre-training in a two-stage manner. Specifically, we introduce two pre-training tasks to incorporate the relations of utterance-response pairs: 1) Response Retrieval by selecting a correct response from a batch of candidates, and 2) Response Generation by mimicking agents to generate the response to a given utterance. Our benchmark results for two real-world customer service datasets show that RSVP significantly outperforms the state-of-the-art baselines by 4.95% for accuracy, 3.4% for MRR@3, and 2.75% for MRR@5 on average. Extensive case studies are investigated to show the validity of incorporating agent responses into the pre-training stage.
2022
NYCU_TWD@LT-EDI-ACL2022: Ensemble Models with VADER and Contrastive Learning for Detecting Signs of Depression from Social Media
Wei-Yao Wang | Yu-Chien Tang | Wei-Wei Du | Wen-Chih Peng
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion
Wei-Yao Wang | Yu-Chien Tang | Wei-Wei Du | Wen-Chih Peng
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion
This paper presents a state-of-the-art solution to the LT-EDI-ACL 2022 Task 4: Detecting Signs of Depression from Social Media Text. The goal of this task is to detect the severity levels of depression of people from social media posts, where people often share their feelings on a daily basis. To detect the signs of depression, we propose a framework with pre-trained language models using rich information instead of training from scratch, gradient boosting and deep learning models for modeling various aspects, and supervised contrastive learning for the generalization ability. Moreover, ensemble techniques are also employed in consideration of the different advantages of each method. Experiments show that our framework achieves a 2nd prize ranking with a macro F1-score of 0.552, showing the effectiveness and robustness of our approach.