TutorMind at BEA 2025 Shared Task: Leveraging Fine-Tuned LLMs and Data Augmentation for Mistake Identification

Fatima Dekmak, Christian Khairallah, Wissam Antoun


Abstract
In light of the growing adoption of large language models (LLMs) as educational tutors, it is crucial to effectively evaluate their pedagogical capabilities across multiple dimensions. Toward this goal, we address the Mistake Identification sub-task of the BEA 2025 Shared task, aiming to assess the accuracy of tutors in detecting and identifying student errors. We experiment with several LLMs, including GPT-4o-mini, Mistral-7B, and Llama-3.1-8B, evaluating them in both zero-shot and fine-tuned settings. To address class imbalance, we augment the training data with synthetic examples, targeting underrepresented labels, generated by Command R+. Our GPT-4o model finetuned on the full development set achieves a strict macro-averaged F1 score of 71.63%, ranking second in the shared task. Our work highlights the effectiveness of fine-tuning on task-specific data and suggests that targeted data augmentation can further support LLM performance on nuanced pedagogical evaluation tasks.
Anthology ID:
2025.bea-1.96
Volume:
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Ekaterina Kochmar, Bashar Alhafni, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, Zheng Yuan
Venues:
BEA | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1203–1211
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bea-1.96/
DOI:
Bibkey:
Cite (ACL):
Fatima Dekmak, Christian Khairallah, and Wissam Antoun. 2025. TutorMind at BEA 2025 Shared Task: Leveraging Fine-Tuned LLMs and Data Augmentation for Mistake Identification. In Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), pages 1203–1211, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
TutorMind at BEA 2025 Shared Task: Leveraging Fine-Tuned LLMs and Data Augmentation for Mistake Identification (Dekmak et al., BEA 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bea-1.96.pdf