Kate Belcher


2026

In this paper, we present the IWM-DKM team submissions to the BEA 2026 Shared Task 2: Rubric-based Short Answer Scoring for German. We systematically explored how fine-tuned language models can be reliably employed for short answer scoring, for which three aspects turn out to be particularly beneficial: supplementing the fine-tuning process with generated domain expertise, restructured rubrics, and thinking traces. To increase the robustness of the scoring, we combine distinct approaches in an ensemble. Our best submissions finished in first place across all tracks, indicating promise for the further application of these strategies in automatic scoring.