Hannah S. Rognan


2025

pdf bib
A discovery procedure for synlexification patterns in the world’s languages
Hannah S. Rognan | Barend Beekhuizen
Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP

Synlexification is the pattern of crosslinguistic lexical semantic variation whereby what is expressed in a single word in one language, is expressed in multiple words in another (e.g., French ‘monter’ vs. English ‘go+up’). We introduce a computational method for automatically extracting instances of synlexification from a parallel corpus at a large scale (many languages, many domains). The method involves debiasing the seed language by splitting up synlexifications in the seed language where other languages consistently split them. The method was applied to a massively parallel corpus of 198 Bible translations. We validate it on a broad sample of cases, and demonstrate its potential for typological research.