A Simple Regularization-based Algorithm for Learning Cross-Domain Word Embeddings

Wei Yang, Wei Lu, Vincent Zheng


Abstract
Learning word embeddings has received a significant amount of attention recently. Often, word embeddings are learned in an unsupervised manner from a large collection of text. The genre of the text typically plays an important role in the effectiveness of the resulting embeddings. How to effectively train word embedding models using data from different domains remains a problem that is less explored. In this paper, we present a simple yet effective method for learning word embeddings based on text from different domains. We demonstrate the effectiveness of our approach through extensive experiments on various down-stream NLP tasks.
Anthology ID:
D17-1312
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Martha Palmer, Rebecca Hwa, Sebastian Riedel
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2898–2904
Language:
URL:
https://aclanthology.org/D17-1312
DOI:
10.18653/v1/D17-1312
Bibkey:
Cite (ACL):
Wei Yang, Wei Lu, and Vincent Zheng. 2017. A Simple Regularization-based Algorithm for Learning Cross-Domain Word Embeddings. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2898–2904, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
A Simple Regularization-based Algorithm for Learning Cross-Domain Word Embeddings (Yang et al., EMNLP 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/D17-1312.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-1/D17-1312.mp4