Finding a Tradeoff between Accuracy and Rater’s Workload in Grading Clustered Short Answers

Andrea Horbach, Alexis Palmer, Magdalena Wolska


Abstract
n this paper we investigate the potential of answer clustering for semi-automatic scoring of short answer questions for German as a foreign language. We use surface features like word and character n-grams to cluster answers to listening comprehension exercises per question and simulate having human graders only label one answer per cluster and then propagating this label to all other members of the cluster. We investigate various ways to select this single item to be labeled and find that choosing the item closest to the centroid of a cluster leads to improved (simulated) grading accuracy over random item selection. Averaged over all questions, we can reduce a teacher’s workload to labeling only 40% of all different answers for a question, while still maintaining a grading accuracy of more than 85%.
Anthology ID:
L14-1680
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
588–595
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/887_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Andrea Horbach, Alexis Palmer, and Magdalena Wolska. 2014. Finding a Tradeoff between Accuracy and Rater’s Workload in Grading Clustered Short Answers. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 588–595, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Finding a Tradeoff between Accuracy and Rater’s Workload in Grading Clustered Short Answers (Horbach et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/887_Paper.pdf