Abstract
This paper explores the use of word2vec and GloVe embeddings for unsupervised measurement of the semantic compositionality of MWE candidates. Through comparison with several human-annotated reference sets, we find word2vec to be substantively superior to GloVe for this task. We also find Simple English Wikipedia to be a poor-quality resource for compositionality assessment, but demonstrate that a sample of 10% of sentences in the English Wikipedia can provide a conveniently tractable corpus with only moderate reduction in the quality of outputs.- Anthology ID:
- 2020.mwe-1.12
- Volume:
- Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
- Month:
- December
- Year:
- 2020
- Address:
- online
- Editors:
- Stella Markantonatou, John McCrae, Jelena Mitrović, Carole Tiberius, Carlos Ramisch, Ashwini Vaidya, Petya Osenova, Agata Savary
- Venue:
- MWE
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 95–100
- Language:
- URL:
- https://aclanthology.org/2020.mwe-1.12
- DOI:
- Cite (ACL):
- Thomas Pickard. 2020. Comparing word2vec and GloVe for Automatic Measurement of MWE Compositionality. In Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, pages 95–100, online. Association for Computational Linguistics.
- Cite (Informal):
- Comparing word2vec and GloVe for Automatic Measurement of MWE Compositionality (Pickard, MWE 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2020.mwe-1.12.pdf