Analyzing Word Embedding Through Structural Equation Modeling

Namgi Han, Katsuhiko Hayashi, Yusuke Miyao


Abstract
Many researchers have tried to predict the accuracies of extrinsic evaluation by using intrinsic evaluation to evaluate word embedding. The relationship between intrinsic and extrinsic evaluation, however, has only been studied with simple correlation analysis, which has difficulty capturing complex cause-effect relationships and integrating external factors such as the hyperparameters of word embedding. To tackle this problem, we employ partial least squares path modeling (PLS-PM), a method of structural equation modeling developed for causal analysis. We propose a causal diagram consisting of the evaluation results on the BATS, VecEval, and SentEval datasets, with a causal hypothesis that linguistic knowledge encoded in word embedding contributes to solving downstream tasks. Our PLS-PM models are estimated with 600 word embeddings, and we prove the existence of causal relations between linguistic knowledge evaluated on BATS and the accuracies of downstream tasks evaluated on VecEval and SentEval in our PLS-PM models. Moreover, we show that the PLS-PM models are useful for analyzing the effect of hyperparameters, including the training algorithm, corpus, dimension, and context window, and for validating the effectiveness of intrinsic evaluation.
Anthology ID:
2020.lrec-1.225
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1823–1832
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.225
DOI:
Bibkey:
Cite (ACL):
Namgi Han, Katsuhiko Hayashi, and Yusuke Miyao. 2020. Analyzing Word Embedding Through Structural Equation Modeling. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1823–1832, Marseille, France. European Language Resources Association.
Cite (Informal):
Analyzing Word Embedding Through Structural Equation Modeling (Han et al., LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2020.lrec-1.225.pdf