Do Syntactic Categories Help in Developmentally Motivated Curriculum Learning for Language Models?

Arzu Burcu Güven, Anna Rogers, Rob Van Der Goot


Abstract
We examine the syntactic properties of BabyLM corpus, and age-groups within CHILDES. While we find that CHILDES does not exhibit strong syntactic differentiation by age, we show that the syntactic knowledge about the training data can be helpful in interpreting model performance on linguistic tasks. For curriculum learning, we explore developmental and several alternative cognitively inspired curriculum approaches. We find that some curricula help with reading tasks, but the main performance improvement come from using the subset of syntactically categorizable data, rather than the full noisy corpus.
Anthology ID:
2025.babylm-main.22
Volume:
Proceedings of the First BabyLM Workshop
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Lucas Charpentier, Leshem Choshen, Ryan Cotterell, Mustafa Omer Gul, Michael Y. Hu, Jing Liu, Jaap Jumelet, Tal Linzen, Aaron Mueller, Candace Ross, Raj Sanjay Shah, Alex Warstadt, Ethan Gotlieb Wilcox, Adina Williams
Venue:
BabyLM
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
288–300
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.babylm-main.22/
DOI:
Bibkey:
Cite (ACL):
Arzu Burcu Güven, Anna Rogers, and Rob Van Der Goot. 2025. Do Syntactic Categories Help in Developmentally Motivated Curriculum Learning for Language Models?. In Proceedings of the First BabyLM Workshop, pages 288–300, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Do Syntactic Categories Help in Developmentally Motivated Curriculum Learning for Language Models? (Güven et al., BabyLM 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.babylm-main.22.pdf