How Well Do Embedding Models Capture Non-compositionality? A View from Multiword Expressions

Navnita Nandakumar, Timothy Baldwin, Bahar Salehi

[How to correct problems with metadata yourself]


Abstract
In this paper, we apply various embedding methods on multiword expressions to study how well they capture the nuances of non-compositional data. Our results from a pool of word-, character-, and document-level embbedings suggest that Word2vec performs the best, followed by FastText and Infersent. Moreover, we find that recently-proposed contextualised embedding models such as Bert and ELMo are not adept at handling non-compositionality in multiword expressions.
Anthology ID:
W19-2004
Volume:
Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP
Month:
June
Year:
2019
Address:
Minneapolis, USA
Editors:
Anna Rogers, Aleksandr Drozd, Anna Rumshisky, Yoav Goldberg
Venue:
RepEval
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
27–34
Language:
URL:
https://aclanthology.org/W19-2004
DOI:
10.18653/v1/W19-2004
Bibkey:
Cite (ACL):
Navnita Nandakumar, Timothy Baldwin, and Bahar Salehi. 2019. How Well Do Embedding Models Capture Non-compositionality? A View from Multiword Expressions. In Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP, pages 27–34, Minneapolis, USA. Association for Computational Linguistics.
Cite (Informal):
How Well Do Embedding Models Capture Non-compositionality? A View from Multiword Expressions (Nandakumar et al., RepEval 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/teach-a-man-to-fish/W19-2004.pdf