Chuangchuang Tan
2025
BJTU at BEA 2025 Shared Task: Task-Aware Prompt Tuning and Data Augmentation for Evaluating AI Math Tutors
Yuming Fan
|
Chuangchuang Tan
|
Wenyu Song
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
We present a prompt-based evaluation framework for assessing AI-generated math tutoring responses across four pedagogical dimensions: mistake identification, mistake location, guidance quality, and actionability. Our approach leverages task-aware prompt tuning on a large language model, supplemented by data augmentation techniques including dialogue shuffling and class-balanced downsampling. In experiments on the BEA 2025 Shared Task benchmark, our system achieved first place in mistake identification and strong top-five rankings in the other tracks. These results demonstrate the effectiveness of structured prompting and targeted augmentation for enhancing LLMs’ ability to provide pedagogically meaningful feedback.