Automatic Extraction of Nominal Phrases from German Learner Texts of Different Proficiency Levels

Ronja Laarmann-Quante, Marco Müller, Eva Belke


Abstract
Correctly inflecting determiners and adjectives so that they agree with the noun in nominal phrases (NPs) is a big challenge for learners of German. Given the increasing number of available learner corpora, a large-scale corpus-based study on the acquisition of this aspect of German morphosyntax would be desirable. In this paper, we present a pilot study in which we investigate how well nouns, their grammatical heads and the dependents that have to agree with the noun can be extracted automatically via dependency parsing. For six samples of the German learner corpus MERLIN (one per proficiency level), we found that in spite of many ungrammatical sentences in texts of low proficiency levels, human annotators find only few true ambiguities that would make the extraction of NPs and their heads infeasible. The automatic parsers, however, perform rather poorly on extracting the relevant elements for texts on CEFR levels A1-B1 (< 70%) but quite well from level B2 onwards ( 90%). We discuss the sources of errors and how performance could potentially be increased in the future.
Anthology ID:
2024.lrec-main.172
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
1925–1931
Language:
URL:
https://aclanthology.org/2024.lrec-main.172
DOI:
Bibkey:
Cite (ACL):
Ronja Laarmann-Quante, Marco Müller, and Eva Belke. 2024. Automatic Extraction of Nominal Phrases from German Learner Texts of Different Proficiency Levels. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 1925–1931, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Automatic Extraction of Nominal Phrases from German Learner Texts of Different Proficiency Levels (Laarmann-Quante et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/add_acl24_videos/2024.lrec-main.172.pdf