DLSU at BEA 2025 Shared Task: Towards Establishing Baseline Models for Pedagogical Response Evaluation Tasks

Maria Monica Manlises, Mark Edward Gonzales, Lanz Lim


Abstract
We present our submission for Tracks 3 (Providing Guidance), 4 (Actionability), and 5 (Tutor Identification) of the BEA 2025 Shared Task on Pedagogical Ability Assessment of AI-Powered Tutors. Our approach sought to investigate the performance of directly using sentence embeddings of tutor responses as input to downstream classifiers (that is, without employing any fine-tuning). To this end, we benchmarked two general-purpose sentence embedding models: gte-modernbert-base (GTE) and all-MiniLM-L12-v2, in combination with two downstream classifiers: XGBoost and multilayer perceptron. Feeding GTE embeddings to a multilayer perceptron achieved macro-F1 scores of 0.4776, 0.5294, and 0.6420 on the official test sets for Tracks 3, 4, and 5, respectively. While overall performance was modest, these results offer insights into the challenges of pedagogical response evaluation and establish a baseline for future improvements.
Anthology ID:
2025.bea-1.101
Volume:
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Ekaterina Kochmar, Bashar Alhafni, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, Zheng Yuan
Venues:
BEA | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1260–1265
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.bea-1.101/
DOI:
Bibkey:
Cite (ACL):
Maria Monica Manlises, Mark Edward Gonzales, and Lanz Lim. 2025. DLSU at BEA 2025 Shared Task: Towards Establishing Baseline Models for Pedagogical Response Evaluation Tasks. In Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), pages 1260–1265, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
DLSU at BEA 2025 Shared Task: Towards Establishing Baseline Models for Pedagogical Response Evaluation Tasks (Manlises et al., BEA 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.bea-1.101.pdf