Lessons Learned in Assessing Student Reflections with LLMs

Mohamed Elaraby, Diane Litman


Abstract
Advances in Large Language Models (LLMs) have sparked growing interest in their potential as explainable text evaluators. While LLMs have shown promise in assessing machine-generated texts in tasks such as summarization and machine translation, their effectiveness in evaluating human-written content—such as student writing in classroom settings—remains underexplored. In this paper, we investigate LLM-based specificity assessment of student reflections written in response to prompts, using three instruction-tuned models. Our findings indicate that although LLMs may underperform compared to simpler supervised baselines in terms of scoring accuracy, they offer a valuable interpretability advantage. Specifically, LLMs can generate user-friendly explanations that enhance the transparency and usability of automated specificity scoring systems.
Anthology ID:
2025.bea-1.48
Volume:
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Ekaterina Kochmar, Bashar Alhafni, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, Zheng Yuan
Venues:
BEA | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
672–686
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bea-1.48/
DOI:
Bibkey:
Cite (ACL):
Mohamed Elaraby and Diane Litman. 2025. Lessons Learned in Assessing Student Reflections with LLMs. In Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), pages 672–686, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Lessons Learned in Assessing Student Reflections with LLMs (Elaraby & Litman, BEA 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bea-1.48.pdf