A Comparative Study of Embedding Models in Predicting the Compositionality of Multiword Expressions

Navnita Nandakumar, Bahar Salehi, Timothy Baldwin


Abstract
In this paper, we perform a comparative evaluation of off-the-shelf embedding models over the task of compositionality prediction of multiword expressions("MWEs"). Our experimental results suggest that character- and document-level models capture knowledge of MWE compositionality and are effective in modelling varying levels of compositionality, with the advantage over word-level models that they do not require token-level identification of MWEs in the training corpus.
Anthology ID:
U18-1009
Volume:
Proceedings of the Australasian Language Technology Association Workshop 2018
Month:
December
Year:
2018
Address:
Dunedin, New Zealand
Editors:
Sunghwan Mac Kim, Xiuzhen (Jenny) Zhang
Venue:
ALTA
SIG:
Publisher:
Note:
Pages:
71–76
Language:
URL:
https://preview.aclanthology.org/icon-24-ingestion/U18-1009/
DOI:
Bibkey:
Cite (ACL):
Navnita Nandakumar, Bahar Salehi, and Timothy Baldwin. 2018. A Comparative Study of Embedding Models in Predicting the Compositionality of Multiword Expressions. In Proceedings of the Australasian Language Technology Association Workshop 2018, pages 71–76, Dunedin, New Zealand.
Cite (Informal):
A Comparative Study of Embedding Models in Predicting the Compositionality of Multiword Expressions (Nandakumar et al., ALTA 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/icon-24-ingestion/U18-1009.pdf