VCWE: Visual Character-Enhanced Word Embeddings

Chi Sun; Xipeng Qiu; Xuan-Jing Huang

doi:10.18653/v1/N19-1277

VCWE: Visual Character-Enhanced Word Embeddings

Abstract

Chinese is a logographic writing system, and the shape of Chinese characters contain rich syntactic and semantic information. In this paper, we propose a model to learn Chinese word embeddings via three-level composition: (1) a convolutional neural network to extract the intra-character compositionality from the visual shape of a character; (2) a recurrent neural network with self-attention to compose character representation into word embeddings; (3) the Skip-Gram framework to capture non-compositionality directly from the contextual information. Evaluations demonstrate the superior performance of our model on four tasks: word similarity, sentiment analysis, named entity recognition and part-of-speech tagging.

Anthology ID:: N19-1277
Volume:: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Month:: June
Year:: 2019
Address:: Minneapolis, Minnesota
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2710–2719
Language:
URL:: https://aclanthology.org/N19-1277
DOI:: 10.18653/v1/N19-1277
Bibkey:
Cite (ACL):: Chi Sun, Xipeng Qiu, and Xuanjing Huang. 2019. VCWE: Visual Character-Enhanced Word Embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2710–2719, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):: VCWE: Visual Character-Enhanced Word Embeddings (Sun et al., NAACL 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-script-update/N19-1277.pdf
Code: HSLCY/VCWE

PDF Search Code