Scoring with Confidence? – Exploring High-confidence Scoring for Saving Manual Grading Effort

Marie Bexte, Andrea Horbach, Lena Schützler, Oliver Christ, Torsten Zesch


Abstract
A possible way to save manual grading effort in short answer scoring is to automatically score answers for which the classifier is highly confident. We explore the feasibility of this approach in a high-stakes exam setting, evaluating three different similarity-based scoring methods, where the similarity score is a direct proxy for model confidence. The decision on an appropriate level of confidence should ideally be made before scoring a new prompt. We thus probe to what extent confidence thresholds are consistent across different datasets and prompts. We find that high-confidence thresholds vary on a prompt-to-prompt basis, and that the overall potential of increased performance at a reasonable cost of additional manual effort is limited.
Anthology ID:
2024.bea-1.11
Volume:
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Ekaterina Kochmar, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, Zheng Yuan
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
119–124
Language:
URL:
https://aclanthology.org/2024.bea-1.11
DOI:
Bibkey:
Cite (ACL):
Marie Bexte, Andrea Horbach, Lena Schützler, Oliver Christ, and Torsten Zesch. 2024. Scoring with Confidence? – Exploring High-confidence Scoring for Saving Manual Grading Effort. In Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024), pages 119–124, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Scoring with Confidence? – Exploring High-confidence Scoring for Saving Manual Grading Effort (Bexte et al., BEA 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2024.bea-1.11.pdf