Jiasheng Zhang


2019

pdf
Treat the Word As a Whole or Look Inside? Subword Embeddings Model Language Change and Typology
Yang Xu | Jiasheng Zhang | David Reitter
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

We use a variant of word embedding model that incorporates subword information to characterize the degree of compositionality in lexical semantics. Our models reveal some interesting yet contrastive patterns of long-term change in multiple languages: Indo-European languages put more weight on subword units in newer words, while conversely Chinese puts less weights on the subwords, but more weight on the word as a whole. Our method provides novel evidence and methodology that enriches existing theories in evolutionary linguistics. The resulting word vectors also has decent performance in NLP-related tasks.