Explaining and Generalizing Skip-Gram through Exponential Family Principal Component Analysis
Ryan Cotterell, Adam Poliak, Benjamin Van Durme, Jason Eisner
Abstract
The popular skip-gram model induces word embeddings by exploiting the signal from word-context coocurrence. We offer a new interpretation of skip-gram based on exponential family PCA-a form of matrix factorization to generalize the skip-gram model to tensor factorization. In turn, this lets us train embeddings through richer higher-order coocurrences, e.g., triples that include positional information (to incorporate syntax) or morphological information (to share parameters across related words). We experiment on 40 languages and show our model improves upon skip-gram.- Anthology ID:
- E17-2028
- Volume:
- Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
- Month:
- April
- Year:
- 2017
- Address:
- Valencia, Spain
- Editors:
- Mirella Lapata, Phil Blunsom, Alexander Koller
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 175–181
- Language:
- URL:
- https://preview.aclanthology.org/build-pipeline-with-new-library/E17-2028/
- DOI:
- Cite (ACL):
- Ryan Cotterell, Adam Poliak, Benjamin Van Durme, and Jason Eisner. 2017. Explaining and Generalizing Skip-Gram through Exponential Family Principal Component Analysis. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 175–181, Valencia, Spain. Association for Computational Linguistics.
- Cite (Informal):
- Explaining and Generalizing Skip-Gram through Exponential Family Principal Component Analysis (Cotterell et al., EACL 2017)
- PDF:
- https://preview.aclanthology.org/build-pipeline-with-new-library/E17-2028.pdf
- Data
- Universal Dependencies