Henry Pit
2025
Henry at BEA 2025 Shared Task: Improving AI Tutor’s Guidance Evaluation Through Context-Aware Distillation
Henry Pit
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Effective AI tutoring hinges on guiding learners with the right balance of support. In this work, we introduce CODE (COntextually-aware Distilled Evaluator), a framework that harnesses advanced large language models (i.e., GPT-4o and Claude-2.7) to generate synthetic, context-aware justifications for human-annotated tutor responses in the BEA 2025 Shared Task. By distilling these justifications into a smaller open-source model (i.e, Phi-3.5-mini-instruct) via initial supervised fine-tuning and then Group Relative Policy Optimization, we achieve substantial gains in label prediction over direct prompting of proprietary LLMs. Our experiments show that CODE reliably identifies strong positive and negative guidance, but like prior work, struggles to distinguish nuanced “middle-ground” cases where partial hints blur with vagueness. We argue that overcoming this limitation will require the development of explicit, feature-based evaluation metrics that systematically map latent pedagogical qualities to model outputs, enabling more transparent and robust assessment of AI-driven tutoring.