Abstract
We propose a novel method that exploits visual information of ideograms and logograms in analyzing Japanese review documents. Our method first converts font images of Japanese characters into character embeddings using convolutional neural networks. It then constructs document embeddings from the character embeddings based on Hierarchical Attention Networks, which represent the documents based on attention mechanisms from a character level to a sentence level. The document embeddings are finally used to predict the labels of documents. Our method provides a way to exploit visual features of characters in languages with ideograms and logograms. In the experiments, our method achieved an accuracy comparable to a character embedding-based model while our method has much fewer parameters since it does not need to keep embeddings of thousands of characters.- Anthology ID:
- I17-2064
- Volume:
- Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
- Month:
- November
- Year:
- 2017
- Address:
- Taipei, Taiwan
- Venue:
- IJCNLP
- SIG:
- Publisher:
- Asian Federation of Natural Language Processing
- Note:
- Pages:
- 378–382
- Language:
- URL:
- https://aclanthology.org/I17-2064
- DOI:
- Cite (ACL):
- Yota Toyama, Makoto Miwa, and Yutaka Sasaki. 2017. Utilizing Visual Forms of Japanese Characters for Neural Review Classification. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 378–382, Taipei, Taiwan. Asian Federation of Natural Language Processing.
- Cite (Informal):
- Utilizing Visual Forms of Japanese Characters for Neural Review Classification (Toyama et al., IJCNLP 2017)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/I17-2064.pdf