Field Embedding: A Unified Grain-Based Framework for Word Representation
Junjie Luo, Xi Chen, Jichao Sun, Yuejia Xiang, Ningyu Zhang, Xiang Wan
Abstract
Word representations empowered with additional linguistic information have been widely studied and proved to outperform traditional embeddings. Current methods mainly focus on learning embeddings for words while embeddings of linguistic information (referred to as grain embeddings) are discarded after the learning. This work proposes a framework field embedding to jointly learn both word and grain embeddings by incorporating morphological, phonetic, and syntactical linguistic fields. The framework leverages an innovative fine-grained pipeline that integrates multiple linguistic fields and produces high-quality grain sequences for learning supreme word representations. A novel algorithm is also designed to learn embeddings for words and grains by capturing information that is contained within each field and that is shared across them. Experimental results of lexical tasks and downstream natural language processing tasks illustrate that our framework can learn better word embeddings and grain embeddings. Qualitative evaluations show grain embeddings effectively capture the semantic information.- Anthology ID:
- 2021.naacl-main.140
- Volume:
- Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Editors:
- Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1754–1762
- Language:
- URL:
- https://aclanthology.org/2021.naacl-main.140
- DOI:
- 10.18653/v1/2021.naacl-main.140
- Cite (ACL):
- Junjie Luo, Xi Chen, Jichao Sun, Yuejia Xiang, Ningyu Zhang, and Xiang Wan. 2021. Field Embedding: A Unified Grain-Based Framework for Word Representation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1754–1762, Online. Association for Computational Linguistics.
- Cite (Informal):
- Field Embedding: A Unified Grain-Based Framework for Word Representation (Luo et al., NAACL 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2021.naacl-main.140.pdf