Multi-prototype Chinese Character Embedding

Yanan Lu; Yue Zhang; Donghong Ji

Multi-prototype Chinese Character Embedding

Abstract

Chinese sentences are written as sequences of characters, which are elementary units of syntax and semantics. Characters are highly polysemous in forming words. We present a position-sensitive skip-gram model to learn multi-prototype Chinese character embeddings, and explore the usefulness of such character embeddings to Chinese NLP tasks. Evaluation on character similarity shows that multi-prototype embeddings are significantly better than a single-prototype baseline. In addition, used as features in the Chinese NER task, the embeddings result in a 1.74% F-score improvement over a state-of-the-art baseline.

Anthology ID:: L16-1138
Volume:: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:: May
Year:: 2016
Address:: Portorož, Slovenia
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 855–859
Language:
URL:: https://aclanthology.org/L16-1138
DOI:
Bibkey:
Cite (ACL):: Yanan Lu, Yue Zhang, and Donghong Ji. 2016. Multi-prototype Chinese Character Embedding. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 855–859, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):: Multi-prototype Chinese Character Embedding (Lu et al., LREC 2016)
Copy Citation:
PDF:: https://preview.aclanthology.org/auto-file-uploads/L16-1138.pdf

PDF Search