Abstract
n this paper we investigate the potential of answer clustering for semi-automatic scoring of short answer questions for German as a foreign language. We use surface features like word and character n-grams to cluster answers to listening comprehension exercises per question and simulate having human graders only label one answer per cluster and then propagating this label to all other members of the cluster. We investigate various ways to select this single item to be labeled and find that choosing the item closest to the centroid of a cluster leads to improved (simulated) grading accuracy over random item selection. Averaged over all questions, we can reduce a teachers workload to labeling only 40% of all different answers for a question, while still maintaining a grading accuracy of more than 85%.- Anthology ID:
- L14-1680
- Volume:
- Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
- Month:
- May
- Year:
- 2014
- Address:
- Reykjavik, Iceland
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 588–595
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/887_Paper.pdf
- DOI:
- Cite (ACL):
- Andrea Horbach, Alexis Palmer, and Magdalena Wolska. 2014. Finding a Tradeoff between Accuracy and Rater’s Workload in Grading Clustered Short Answers. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 588–595, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Cite (Informal):
- Finding a Tradeoff between Accuracy and Rater’s Workload in Grading Clustered Short Answers (Horbach et al., LREC 2014)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/887_Paper.pdf