Vocabulary Shapes Cross-Lingual Variation of Word-Order Learnability in Language Models
Jonas Mayer Martins, Jaap Jumelet, Viola Priesemann, Lisa Beinborn
Abstract
Why do some languages like Czech permit free word order, while others like English do not? We address this question by pretraining transformer language models on a spectrum of synthetic word-order variants of natural languages. We observe that greater word-order irregularity consistently raises model surprisal, indicating reduced learnability. Sentence reversal, however, affects learnability only weakly. A coarse distinction of free- (e.g., Czech and Finnish) and fixed-word-order languages (e.g., English and French) does not explain cross-lingual variation. Instead, the structure of the word and subword vocabulary strongly predicts the model surprisal. Overall, vocabulary structure emerges as a key driver of computational word-order learnability across languages.- Anthology ID:
- 2026.acl-long.1510
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 32724–32740
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1510/
- DOI:
- Cite (ACL):
- Jonas Mayer Martins, Jaap Jumelet, Viola Priesemann, and Lisa Beinborn. 2026. Vocabulary Shapes Cross-Lingual Variation of Word-Order Learnability in Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 32724–32740, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Vocabulary Shapes Cross-Lingual Variation of Word-Order Learnability in Language Models (Martins et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1510.pdf