Jonathan Nylk
2026
Domain-Adaptive Pre-training for Automated Short Answer Grading in Conceptual Physics: Reliability, Question-Level Analysis, and Error Reduction
Shirin Lade | Alistair Willis | Jonathan Nylk | Oli Howson
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Shirin Lade | Alistair Willis | Jonathan Nylk | Oli Howson
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
This paper investigates whether automated short answer grading can reliably support teachers when marking conceptual physics responses in settings with limited labelled data. Using free-text responses derived from Force Concept Inventory-style questions, the study shows that incorporating subject-specific knowledge improves grading consistency, particularly in early deployment scenarios. The system reduces grading errors and provides more reliable agreement with reference judgments, especially for more challenging questions. These results suggest that automated grading can assist teachers by supporting marking decisions and prioritising responses for review, while still requiring human oversight.