Abstract
Chinese is a logographic writing system, and the shape of Chinese characters contain rich syntactic and semantic information. In this paper, we propose a model to learn Chinese word embeddings via three-level composition: (1) a convolutional neural network to extract the intra-character compositionality from the visual shape of a character; (2) a recurrent neural network with self-attention to compose character representation into word embeddings; (3) the Skip-Gram framework to capture non-compositionality directly from the contextual information. Evaluations demonstrate the superior performance of our model on four tasks: word similarity, sentiment analysis, named entity recognition and part-of-speech tagging.- Anthology ID:
- N19-1277
- Volume:
- Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, Minnesota
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2710–2719
- Language:
- URL:
- https://aclanthology.org/N19-1277
- DOI:
- 10.18653/v1/N19-1277
- Cite (ACL):
- Chi Sun, Xipeng Qiu, and Xuanjing Huang. 2019. VCWE: Visual Character-Enhanced Word Embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2710–2719, Minneapolis, Minnesota. Association for Computational Linguistics.
- Cite (Informal):
- VCWE: Visual Character-Enhanced Word Embeddings (Sun et al., NAACL 2019)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/N19-1277.pdf
- Code
- HSLCY/VCWE