Malte Sternik

2026

Using k-Shot Prompting with Large k for the Automated Scoring of a German Written Elicited Imitation Test
Malte Sternik | Ronja Laarmann-Quante | Anastasia Drackert
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)

This paper explores the application of a Large Language Model (LLM) using k-shot prompting with large k for automatically scoring a German Written Elicited Imitation Test (WEIT), a test for assessing literacy-dependent procedural knowledge in German as a foreign language. In this test, test-takers are briefly presented with written sentences which they then have to reproduce in writing as accurately as possible. The responses are scored on an ordinal scale which differentiates between different types of errors (e.g. lexical vs. grammatical). We find that with increasing k (in a range from 1 to 700) accuracy increases significantly but it also depends on the drawn sample and varies across different runs of the same prompt. Overall, the k-shot setting which relies on in-context learning without being provided with the scoring rubric outperforms a baseline where only the scoring rubric is provided to the model. However, the LLM does not outperform previous results based on rule-based or BERT-based models.

Co-authors

Venues

BEA1
WS1

Fix author