Understanding the Source of Semantic Regularities in Word Embeddings

Hsiao-Yu Chiang, Jose Camacho-Collados, Zachary Pardos


Abstract
Semantic relations are core to how humans understand and express concepts in the real world using language. Recently, there has been a thread of research aimed at modeling these relations by learning vector representations from text corpora. Most of these approaches focus strictly on leveraging the co-occurrences of relationship word pairs within sentences. In this paper, we investigate the hypothesis that examples of a lexical relation in a corpus are fundamental to a neural word embedding’s ability to complete analogies involving the relation. Our experiments, in which we remove all known examples of a relation from training corpora, show only marginal degradation in analogy completion performance involving the removed relation. This finding enhances our understanding of neural word embeddings, showing that co-occurrence information of a particular semantic relation is not the main source of their structural regularity.
Anthology ID:
2020.conll-1.9
Volume:
Proceedings of the 24th Conference on Computational Natural Language Learning
Month:
November
Year:
2020
Address:
Online
Venue:
CoNLL
SIG:
SIGNLL
Publisher:
Association for Computational Linguistics
Note:
Pages:
119–131
Language:
URL:
https://aclanthology.org/2020.conll-1.9
DOI:
10.18653/v1/2020.conll-1.9
Bibkey:
Cite (ACL):
Hsiao-Yu Chiang, Jose Camacho-Collados, and Zachary Pardos. 2020. Understanding the Source of Semantic Regularities in Word Embeddings. In Proceedings of the 24th Conference on Computational Natural Language Learning, pages 119–131, Online. Association for Computational Linguistics.
Cite (Informal):
Understanding the Source of Semantic Regularities in Word Embeddings (Chiang et al., CoNLL 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2020.conll-1.9.pdf