Kangsan Noh

2026

Do BabyLMs Wanna Learn Wanna Contraction? On the Learnability without Language-Specific Bias
Kangsan Noh | Sanghoun Song
Findings of the Association for Computational Linguistics: ACL 2026

This study investigates whether the grammatical constraints on wanna contraction—a phenomenon traditionally cited as evidence for innate linguistic knowledge—can be learned via BabyLMs, which are designed to reflect cognitively plausible learning conditions. Two datasets were constructed from the CHILDES corpus, varying in embedded verb frequency (high vs. low) and grammaticality, and contrasting grammatical instances (object extraction contexts) with ungrammatical ones (subject extraction contexts) of wanna contractions. Using surprisal as a metric, we evaluated 24 BabyLMs from the 2024 BabyLM Challenge alongside four standard models, including BERT and GPT-2. While the standard models performed with near-perfect consistency, the BabyLMs showed modest but meaningful sensitivity, particularly those trained on larger datasets and tested on high-frequency wanna instances. In particular, only encoder-based BabyLMs captured the grammatical constraint, with babylm24_MLSM exhibiting consistent performance. Nonetheless, our findings provide evidence for limited and conditional learnability of wanna contraction by artificial learners under cognitively realistic input conditions.

Co-authors

Sanghoun Song 1

Venues

Findings1

Fix author