Abstract
In this paper, we apply various embedding methods on multiword expressions to study how well they capture the nuances of non-compositional data. Our results from a pool of word-, character-, and document-level embbedings suggest that Word2vec performs the best, followed by FastText and Infersent. Moreover, we find that recently-proposed contextualised embedding models such as Bert and ELMo are not adept at handling non-compositionality in multiword expressions.- Anthology ID:
- W19-2004
- Volume:
- Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, USA
- Editors:
- Anna Rogers, Aleksandr Drozd, Anna Rumshisky, Yoav Goldberg
- Venue:
- RepEval
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 27–34
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/W19-2004/
- DOI:
- 10.18653/v1/W19-2004
- Cite (ACL):
- Navnita Nandakumar, Timothy Baldwin, and Bahar Salehi. 2019. How Well Do Embedding Models Capture Non-compositionality? A View from Multiword Expressions. In Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP, pages 27–34, Minneapolis, USA. Association for Computational Linguistics.
- Cite (Informal):
- How Well Do Embedding Models Capture Non-compositionality? A View from Multiword Expressions (Nandakumar et al., RepEval 2019)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/W19-2004.pdf