bea-jh at BEA 2025 Shared Task: Evaluating AI-powered Tutors through Pedagogically-Informed Reasoning

Jihyeon Roh, Jinhyun Bang


Abstract
The growing use of large language models (LLMs) for AI-powered tutors in education highlights the need for reliable evaluation of their pedagogical abilities. In this work, we propose a reasoning-based evaluation methodology that leverages pedagogical domain knowledge to assess LLM-generated feedback in mathematical dialogues while providing insights into why a particular evaluation is given. We design structured prompts to invoke pedagogically-informed reasoning from LLMs and compare base model candidates selected for their strengths in reasoning, mathematics, and overall instruction-following. We employ Group Relative Policy Optimization (GRPO), a reinforcement learning method known to improve reasoning performance, to train models to perform evaluation in four pedagogically motivated dimensions, Mistake Identification, Mistake Location, Providing Guidance, and Actionability. Experimental results show that our GRPO-based models consistently outperform the base model and GPT-4.1, and surpass models trained using supervised fine-tuning in three out of four dimensions. Notably, our method achieved top-ranked performance in Actionability and competitive performance in two other dimensions in the BEA 2025 Shared Task under the team name bea-jh, underscoring the value of generating pedagogically grounded rationales for improving the quality of educational feedback evaluation.
Anthology ID:
2025.bea-1.80
Volume:
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Ekaterina Kochmar, Bashar Alhafni, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, Zheng Yuan
Venues:
BEA | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1049–1059
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bea-1.80/
DOI:
Bibkey:
Cite (ACL):
Jihyeon Roh and Jinhyun Bang. 2025. bea-jh at BEA 2025 Shared Task: Evaluating AI-powered Tutors through Pedagogically-Informed Reasoning. In Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), pages 1049–1059, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
bea-jh at BEA 2025 Shared Task: Evaluating AI-powered Tutors through Pedagogically-Informed Reasoning (Roh & Bang, BEA 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bea-1.80.pdf