BJTU at BEA 2025 Shared Task: Task-Aware Prompt Tuning and Data Augmentation for Evaluating AI Math Tutors

Yuming Fan, Chuangchuang Tan, Wenyu Song


Abstract
We present a prompt-based evaluation framework for assessing AI-generated math tutoring responses across four pedagogical dimensions: mistake identification, mistake location, guidance quality, and actionability. Our approach leverages task-aware prompt tuning on a large language model, supplemented by data augmentation techniques including dialogue shuffling and class-balanced downsampling. In experiments on the BEA 2025 Shared Task benchmark, our system achieved first place in mistake identification and strong top-five rankings in the other tracks. These results demonstrate the effectiveness of structured prompting and targeted augmentation for enhancing LLMs’ ability to provide pedagogically meaningful feedback.
Anthology ID:
2025.bea-1.82
Volume:
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Ekaterina Kochmar, Bashar Alhafni, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, Zheng Yuan
Venues:
BEA | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1073–1077
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bea-1.82/
DOI:
Bibkey:
Cite (ACL):
Yuming Fan, Chuangchuang Tan, and Wenyu Song. 2025. BJTU at BEA 2025 Shared Task: Task-Aware Prompt Tuning and Data Augmentation for Evaluating AI Math Tutors. In Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), pages 1073–1077, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
BJTU at BEA 2025 Shared Task: Task-Aware Prompt Tuning and Data Augmentation for Evaluating AI Math Tutors (Fan et al., BEA 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bea-1.82.pdf