Jonathan Nylk


2026

This paper investigates whether automated short answer grading can reliably support teachers when marking conceptual physics responses in settings with limited labelled data. Using free-text responses derived from Force Concept Inventory-style questions, the study shows that incorporating subject-specific knowledge improves grading consistency, particularly in early deployment scenarios. The system reduces grading errors and provides more reliable agreement with reference judgments, especially for more challenging questions. These results suggest that automated grading can assist teachers by supporting marking decisions and prioritising responses for review, while still requiring human oversight.