Abstract
In this paper, we propose to learn word embeddings based on the recent fixed-size ordinally forgetting encoding (FOFE) method, which can almost uniquely encode any variable-length sequence into a fixed-size representation. We use FOFE to fully encode the left and right context of each word in a corpus to construct a novel word-context matrix, which is further weighted and factorized using truncated SVD to generate low-dimension word embedding vectors. We evaluate this alternate method in encoding word-context statistics and show the new FOFE method has a notable effect on the resulting word embeddings. Experimental results on several popular word similarity tasks have demonstrated that the proposed method outperforms other SVD models that use canonical count based techniques to generate word context matrices.- Anthology ID:
- D17-1031
- Volume:
- Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
- Month:
- September
- Year:
- 2017
- Address:
- Copenhagen, Denmark
- Editors:
- Martha Palmer, Rebecca Hwa, Sebastian Riedel
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 310–315
- Language:
- URL:
- https://aclanthology.org/D17-1031
- DOI:
- 10.18653/v1/D17-1031
- Cite (ACL):
- Joseph Sanu, Mingbin Xu, Hui Jiang, and Quan Liu. 2017. Word Embeddings based on Fixed-Size Ordinally Forgetting Encoding. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 310–315, Copenhagen, Denmark. Association for Computational Linguistics.
- Cite (Informal):
- Word Embeddings based on Fixed-Size Ordinally Forgetting Encoding (Sanu et al., EMNLP 2017)
- PDF:
- https://preview.aclanthology.org/fix-dup-bibkey/D17-1031.pdf