Improved Word Embeddings with Implicit Structure Information

Jie Shen, Cong Liu


Abstract
Distributed word representation is an efficient method for capturing semantic and syntactic word relations. In this work, we introduce an extension to the continuous bag-of-words model for learning word representations efficiently by using implicit structure information. Instead of relying on a syntactic parser which might be noisy and slow to build, we compute weights representing probabilities of syntactic relations based on the Huffman softmax tree in an efficient heuristic. The constructed “implicit graphs” from these weights show that these weights contain useful implicit structure information. Extensive experiments performed on several word similarity and word analogy tasks show gains compared to the basic continuous bag-of-words model.
Anthology ID:
C16-1227
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Editors:
Yuji Matsumoto, Rashmi Prasad
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
2408–2417
Language:
URL:
https://aclanthology.org/C16-1227
DOI:
Bibkey:
Cite (ACL):
Jie Shen and Cong Liu. 2016. Improved Word Embeddings with Implicit Structure Information. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2408–2417, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):
Improved Word Embeddings with Implicit Structure Information (Shen & Liu, COLING 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/C16-1227.pdf