A discovery procedure for synlexification patterns in the world’s languages

Hannah S. Rognan, Barend Beekhuizen


Abstract
Synlexification is the pattern of crosslinguistic lexical semantic variation whereby what is expressed in a single word in one language, is expressed in multiple words in another (e.g., French ‘monter’ vs. English ‘go+up’). We introduce a computational method for automatically extracting instances of synlexification from a parallel corpus at a large scale (many languages, many domains). The method involves debiasing the seed language by splitting up synlexifications in the seed language where other languages consistently split them. The method was applied to a massively parallel corpus of 198 Bible translations. We validate it on a broad sample of cases, and demonstrate its potential for typological research.
Anthology ID:
2025.sigtyp-1.12
Volume:
Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Month:
August
Year:
2025
Address:
Vinenna. Austria
Editors:
Michael Hahn, Priya Rani, Ritesh Kumar, Andreas Shcherbakov, Alexey Sorokin, Oleg Serikov, Ryan Cotterell, Ekaterina Vylomova
Venues:
SIGTYP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
93–113
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.sigtyp-1.12/
DOI:
Bibkey:
Cite (ACL):
Hannah S. Rognan and Barend Beekhuizen. 2025. A discovery procedure for synlexification patterns in the world’s languages. In Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 93–113, Vinenna. Austria. Association for Computational Linguistics.
Cite (Informal):
A discovery procedure for synlexification patterns in the world’s languages (Rognan & Beekhuizen, SIGTYP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.sigtyp-1.12.pdf