Abstract
Pre-trained word embeddings are widely used in various fields. However, the coverage of pre-trained word embeddings only includes words that appeared in corpora where pre-trained embeddings are learned. It means that the words which do not appear in training corpus are ignored in tasks, and it could lead to the limited performance of neural models. In this paper, we propose a simple yet effective method to represent out-of-vocabulary (OOV) words. Unlike prior works that solely utilize subword information or knowledge, our method makes use of both information to represent OOV words. To this end, we propose two stages of representation learning. In the first stage, we learn subword embeddings from the pre-trained word embeddings by using an additive composition function of subwords. In the second stage, we map the learned subwords into semantic networks (e.g., WordNet). We then re-train the subword embeddings by using lexical entries on semantic lexicons that could include newly observed subwords. This two-stage learning makes the coverage of words broaden to a great extent. The experimental results clearly show that our method provides consistent performance improvements over strong baselines that use subwords or lexical resources separately.- Anthology ID:
- 2020.lrec-1.587
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 4774–4780
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.587
- DOI:
- Cite (ACL):
- Yeachan Kim, Kang-Min Kim, and SangKeun Lee. 2020. Representation Learning for Unseen Words by Bridging Subwords to Semantic Networks. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4774–4780, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Representation Learning for Unseen Words by Bridging Subwords to Semantic Networks (Kim et al., LREC 2020)
- PDF:
- https://preview.aclanthology.org/revert-3132-ingestion-checklist/2020.lrec-1.587.pdf