Do Construction Distributions Shape Formal Language Learning In German BabyLMs?

Bastian Bunzeck, Daniel Duran, Sina Zarrieß


Abstract
We analyze the influence of utterance-level construction distributions in German child-directed/child-available speech on the resulting word-level, syntactic and semantic competence (and their underlying learning trajectories) in small LMs, which we train on a novel collection of developmentally plausible language data for German. We find that trajectories are surprisingly robust for markedly different distributions of constructions in the training data, which have little effect on final accuracies and almost no effect on global learning trajectories. While syntax learning benefits from more complex utterances, word-level learning culminates in better scores with more fragmentary utterances. We argue that LMs trained on developmentally plausible data can contribute to debates on how conducive different kinds of linguistic stimuli are to language learning.
Anthology ID:
2025.conll-1.12
Volume:
Proceedings of the 29th Conference on Computational Natural Language Learning
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Gemma Boleda, Michael Roth
Venues:
CoNLL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
169–186
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.conll-1.12/
DOI:
Bibkey:
Cite (ACL):
Bastian Bunzeck, Daniel Duran, and Sina Zarrieß. 2025. Do Construction Distributions Shape Formal Language Learning In German BabyLMs?. In Proceedings of the 29th Conference on Computational Natural Language Learning, pages 169–186, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Do Construction Distributions Shape Formal Language Learning In German BabyLMs? (Bunzeck et al., CoNLL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.conll-1.12.pdf