Jennifer Meyer


2024

pdf
DARIUS: A Comprehensive Learner Corpus for Argument Mining in German-Language Essays
Nils-Jonathan Schaller | Andrea Horbach | Lars Ingver Höft | Yuning Ding | Jan Luca Bahr | Jennifer Meyer | Thorben Jansen
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this paper, we present the DARIUS (Digital Argumentation Instruction for Science) corpus for argumentation quality on 4589 essays written by 1839 German secondary school students. The corpus is annotated according to a fine-grained annotation scheme, ranging from a broader perspective like content zones, to more granular features like argumentation coverage/reach and argumentative discourse units like claims and warrants. The features have inter-annotator agreements up to 0.83 Krippendorff’s α. The corpus and dataset are publicly available for further research in argument mining.

pdf
Fairness in Automated Essay Scoring: A Comparative Analysis of Algorithms on German Learner Essays from Secondary Education
Nils-Jonathan Schaller | Yuning Ding | Andrea Horbach | Jennifer Meyer | Thorben Jansen
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

Pursuing educational equity, particularly in writing instruction, requires that all students receive fair (i.e., accurate and unbiased) assessment and feedback on their texts. Automated Essay Scoring (AES) algorithms have so far focused on optimizing the mean accuracy of their scores and paid less attention to fair scores for all subgroups, although research shows that students receive unfair scores on their essays in relation to demographic variables, which in turn are related to their writing competence. We add to the literature arguing that AES should also optimize for fairness by presenting insights on the fairness of scoring algorithms on a corpus of learner texts in the German language and introduce the novelty of examining fairness on psychological and demographic differences in addition to demographic differences. We compare shallow learning, deep learning, and large language models with full and skewed subsets of training data to investigate what is needed for fair scoring. The results show that training on a skewed subset of higher and lower cognitive ability students shows no bias but very low accuracy for students outside the training set. Our results highlight the need for specific training data on all relevant user groups, not only for demographic background variables but also for cognitive abilities as psychological student characteristics.

2022

pdf
Bringing Automatic Scoring into the Classroom - Measuring the Impact of Automated Analytic Feedback on Student Writing Performance
Andrea Horbach | Ronja Laarmann-Quante | Lucas Liebenow | Thorben Jansen | Stefan Keller | Jennifer Meyer | Torsten Zesch | Johanna Fleckenstein
Proceedings of the 11th Workshop on NLP for Computer Assisted Language Learning