mwetoolkit+sem: Integrating Word Embeddings in the mwetoolkit for Semantic MWE Processing

Silvio Cordeiro, Carlos Ramisch, Aline Villavicencio


Abstract
This paper presents mwetoolkit+sem: an extension of the mwetoolkit that estimates semantic compositionality scores for multiword expressions (MWEs) based on word embeddings. First, we describe our implementation of vector-space operations working on distributional vectors. The compositionality score is based on the cosine distance between the MWE vector and the composition of the vectors of its member words. Our generic system can handle several types of word embeddings and MWE lists, and may combine individual word representations using several composition techniques. We evaluate our implementation on a dataset of 1042 English noun compounds, comparing different configurations of the underlying word embeddings and word-composition models. We show that our vector-based scores model non-compositionality better than standard association measures such as log-likelihood.
Anthology ID:
L16-1194
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1221–1225
Language:
URL:
https://aclanthology.org/L16-1194
DOI:
Bibkey:
Cite (ACL):
Silvio Cordeiro, Carlos Ramisch, and Aline Villavicencio. 2016. mwetoolkit+sem: Integrating Word Embeddings in the mwetoolkit for Semantic MWE Processing. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1221–1225, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
mwetoolkit+sem: Integrating Word Embeddings in the mwetoolkit for Semantic MWE Processing (Cordeiro et al., LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/L16-1194.pdf