Child-Directed Language Does Not Consistently Boost Syntax Learning in Language Models
Francesca Padovani, Jaap Jumelet, Yevgen Matusevych, Arianna Bisazza
Abstract
Seminal work by Huebner et al. (2021) showed that language models (LMs) trained on English Child-Directed Language (CDL) can outperform LMs trained on an equal amount of adult-directed text like Wikipedia. However, it remains unclear whether these results generalize across languages, architectures, and evaluation settings. We test this by comparing models trained on CDL vs. Wikipedia across two LM objectives (masked and causal), three languages (English, French, German), and three syntactic minimal pair benchmarks. Our results on these benchmarks show inconsistent benefits of CDL, which in most cases is outperformed by Wikipedia models. We then identify various shortcomings in these benchmarks, and introduce a novel testing methodology, FIT-CLAMS, which uses a frequency-controlled design to enable balanced comparisons across training corpora. Through minimal pair evaluations and regression analysis we show that training on CDL does not yield stronger generalizations for acquiring syntax and highlight the importance of controlling for frequency effects when evaluating syntactic ability.- Anthology ID:
- 2025.emnlp-main.999
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 19746–19767
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.999/
- DOI:
- Cite (ACL):
- Francesca Padovani, Jaap Jumelet, Yevgen Matusevych, and Arianna Bisazza. 2025. Child-Directed Language Does Not Consistently Boost Syntax Learning in Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 19746–19767, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- Child-Directed Language Does Not Consistently Boost Syntax Learning in Language Models (Padovani et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.999.pdf