Abstract
Conventional word embeddings are trained with specific criteria (e.g., based on language modeling or co-occurrence) inside a single information source, disregarding the opportunity for further calibration using external knowledge. This paper presents a unified framework that leverages pre-learned or external priors, in the form of a regularizer, for enhancing conventional language model-based embedding learning. We consider two types of regularizers. The first type is derived from topic distribution by running LDA on unlabeled data. The second type is based on dictionaries that are created with human annotation efforts. To effectively learn with the regularizers, we propose a novel data structure, trajectory softmax, in this paper. The resulting embeddings are evaluated by word similarity and sentiment classification. Experimental results show that our learning framework with regularization from prior knowledge improves embedding quality across multiple datasets, compared to a diverse collection of baseline methods.- Anthology ID:
- K17-1016
- Volume:
- Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)
- Month:
- August
- Year:
- 2017
- Address:
- Vancouver, Canada
- Editors:
- Roger Levy, Lucia Specia
- Venue:
- CoNLL
- SIG:
- SIGNLL
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 143–152
- Language:
- URL:
- https://aclanthology.org/K17-1016
- DOI:
- 10.18653/v1/K17-1016
- Cite (ACL):
- Yan Song, Chia-Jung Lee, and Fei Xia. 2017. Learning Word Representations with Regularization from Prior Knowledge. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 143–152, Vancouver, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Learning Word Representations with Regularization from Prior Knowledge (Song et al., CoNLL 2017)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/K17-1016.pdf
- Data
- IMDb Movie Reviews