Analyzing the Linguistic Priors of Language Models with Synthetic Languages

Alessio Tosolini, Terra Blevins


Abstract
While modern language model architectures are often assumed to be language-agnostic, there is limited evidence as to whether these models actually process the wide diversity of natural languages equally well. We investigate this question by analyzing how well LMs learn carefully constructed artificial languages containing a variety of verbal complexity, ranging from simple paradigms to covering far more verb classes than occur in natural languages. Rather than learning all languages equally efficiently, models trained on these languages show strict preferences for processing simpler languages. Furthermore, while some observed behaviors mimic human linguistic priors, we find that they indicate the model memorizes its training data rather than generalizes from it.
Anthology ID:
2025.sigtyp-1.2
Volume:
Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:
August
Year:
2025
Address:
Vinenna. Austria
Editors:
Michael Hahn, Priya Rani, Ritesh Kumar, Andreas Shcherbakov, Alexey Sorokin, Oleg Serikov, Ryan Cotterell, Ekaterina Vylomova
Venues:
SIGTYP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7–15
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.sigtyp-1.2/
DOI:
Bibkey:
Cite (ACL):
Alessio Tosolini and Terra Blevins. 2025. Analyzing the Linguistic Priors of Language Models with Synthetic Languages. In Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 7–15, Vinenna. Austria. Association for Computational Linguistics.
Cite (Informal):
Analyzing the Linguistic Priors of Language Models with Synthetic Languages (Tosolini & Blevins, SIGTYP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.sigtyp-1.2.pdf