Oliver Christ


2024

pdf
Scoring with Confidence? – Exploring High-confidence Scoring for Saving Manual Grading Effort
Marie Bexte | Andrea Horbach | Lena Schützler | Oliver Christ | Torsten Zesch
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

A possible way to save manual grading effort in short answer scoring is to automatically score answers for which the classifier is highly confident. We explore the feasibility of this approach in a high-stakes exam setting, evaluating three different similarity-based scoring methods, where the similarity score is a direct proxy for model confidence. The decision on an appropriate level of confidence should ideally be made before scoring a new prompt. We thus probe to what extent confidence thresholds are consistent across different datasets and prompts. We find that high-confidence thresholds vary on a prompt-to-prompt basis, and that the overall potential of increased performance at a reasonable cost of additional manual effort is limited.