Treat the Word As a Whole or Look Inside? Subword Embeddings Model Language Change and Typology

Yang Xu, Jiasheng Zhang, David Reitter


Abstract
We use a variant of word embedding model that incorporates subword information to characterize the degree of compositionality in lexical semantics. Our models reveal some interesting yet contrastive patterns of long-term change in multiple languages: Indo-European languages put more weight on subword units in newer words, while conversely Chinese puts less weights on the subwords, but more weight on the word as a whole. Our method provides novel evidence and methodology that enriches existing theories in evolutionary linguistics. The resulting word vectors also has decent performance in NLP-related tasks.
Anthology ID:
W19-4717
Volume:
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change
Month:
August
Year:
2019
Address:
Florence, Italy
Venue:
LChange
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
136–145
Language:
URL:
https://aclanthology.org/W19-4717
DOI:
10.18653/v1/W19-4717
Bibkey:
Cite (ACL):
Yang Xu, Jiasheng Zhang, and David Reitter. 2019. Treat the Word As a Whole or Look Inside? Subword Embeddings Model Language Change and Typology. In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, pages 136–145, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Treat the Word As a Whole or Look Inside? Subword Embeddings Model Language Change and Typology (Xu et al., LChange 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/remove-xml-comments/W19-4717.pdf
Code
 innerfirexy/lchange2019