Tiny Word Embeddings Using Globally Informed Reconstruction

Sora Ohashi, Mao Isogawa, Tomoyuki Kajiwara, Yuki Arase


Abstract
We reduce the model size of pre-trained word embeddings by a factor of 200 while preserving its quality. Previous studies in this direction created a smaller word embedding model by reconstructing pre-trained word representations from those of subwords, which allows to store only a smaller number of subword embeddings in the memory. However, previous studies that train the reconstruction models using only target words cannot reduce the model size extremely while preserving its quality. Inspired by the observation of words with similar meanings having similar embeddings, our reconstruction training learns the global relationships among words, which can be employed in various models for word embedding reconstruction. Experimental results on word similarity benchmarks show that the proposed method improves the performance of the all subword-based reconstruction models.
Anthology ID:
2020.coling-main.103
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
1199–1203
Language:
URL:
https://aclanthology.org/2020.coling-main.103
DOI:
10.18653/v1/2020.coling-main.103
Bibkey:
Cite (ACL):
Sora Ohashi, Mao Isogawa, Tomoyuki Kajiwara, and Yuki Arase. 2020. Tiny Word Embeddings Using Globally Informed Reconstruction. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1199–1203, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Tiny Word Embeddings Using Globally Informed Reconstruction (Ohashi et al., COLING 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2020.coling-main.103.pdf