GermDetect: Verb Placement Error Detection Datasets for Learners of Germanic Languages

Noah-Manuel Michael; Andrea Horbach

GermDetect: Verb Placement Error Detection Datasets for Learners of Germanic Languages

Abstract

Correct verb placement is difficult to acquire for second-language learners of Germanic languages. However, word order errors and, consequently, verb placement errors, are heavily underrepresented in benchmark datasets of NLP tasks such as grammatical error detection/correction and linguistic acceptability assessment. If they are present, they are most often naively introduced, or classification occurs at the sentence level, preventing the precise identification of individual errors and the provision of appropriate feedback to learners. To remedy this, we present GermDetect: Universal Dependencies-based, linguistically informed verb placement error detection datasets for learners of Germanic languages, designed as a token classification task. As our datasets are UD-based, we are able to provide them in most major Germanic languages: Afrikaans, German, Dutch, Faroese, Icelandic, Danish, Norwegian (Bokmål and Nynorsk), and Swedish.We train multilingual BERT models on GermDetect and show that linguistically informed, UD-based error induction results in more effective models for verb placement error detection than models trained on naively introduced errors. Finally, we conduct ablation studies on multilingual training and find that lower-resource languages benefit from the inclusion of structurally related languages in training.

Anthology ID:: 2025.bea-1.59
Volume:: Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Ekaterina Kochmar, Bashar Alhafni, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anaïs Tack, Victoria Yaneva, Zheng Yuan
Venues:: BEA | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 818–829
Language:
URL:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bea-1.59/
DOI:
Bibkey:
Cite (ACL):: Noah-Manuel Michael and Andrea Horbach. 2025. GermDetect: Verb Placement Error Detection Datasets for Learners of Germanic Languages. In Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), pages 818–829, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: GermDetect: Verb Placement Error Detection Datasets for Learners of Germanic Languages (Michael & Horbach, BEA 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bea-1.59.pdf

PDF Cite Search Fix data