Abstract
In this study, we compare token representations constructed from visual features (i.e., pixels) with standard lookup-based embeddings. Our goal is to gain insight about the challenges of encoding a text representation from low-level features, e.g. from characters or pixels. We focus on Chinese, which—as a logographic language—has properties that make a representation via visual features challenging and interesting. To train and evaluate different models for the token representation, we chose the task of character-based neural machine translation (NMT) from Chinese to English. We found that a token representation computed only from visual features can achieve competitive results to lookup embeddings. However, we also show different strengths and weaknesses in the models’ performance in a part-of-speech tagging task and also a semantic similarity task. In summary, we show that it is possible to achieve a text representation only from pixels. We hope that this is a useful stepping stone for future studies that exclusively rely on visual input, or aim at exploiting visual features of written language.- Anthology ID:
- W18-3025
- Volume:
- Proceedings of the Third Workshop on Representation Learning for NLP
- Month:
- July
- Year:
- 2018
- Address:
- Melbourne, Australia
- Venue:
- RepL4NLP
- SIG:
- SIGREP
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 187–194
- Language:
- URL:
- https://aclanthology.org/W18-3025
- DOI:
- 10.18653/v1/W18-3025
- Cite (ACL):
- Samuel Broscheit. 2018. Learning Distributional Token Representations from Visual Features. In Proceedings of the Third Workshop on Representation Learning for NLP, pages 187–194, Melbourne, Australia. Association for Computational Linguistics.
- Cite (Informal):
- Learning Distributional Token Representations from Visual Features (Broscheit, RepL4NLP 2018)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/W18-3025.pdf