Lexical Substitution for Evaluating Compositional Distributional Models

Maja Buljan, Sebastian Padó, Jan Šnajder


Abstract
Compositional Distributional Semantic Models (CDSMs) model the meaning of phrases and sentences in vector space. They have been predominantly evaluated on limited, artificial tasks such as semantic sentence similarity on hand-constructed datasets. This paper argues for lexical substitution (LexSub) as a means to evaluate CDSMs. LexSub is a more natural task, enables us to evaluate meaning composition at the level of individual words, and provides a common ground to compare CDSMs with dedicated LexSub models. We create a LexSub dataset for CDSM evaluation from a corpus with manual “all-words” LexSub annotation. Our experiments indicate that the Practical Lexical Function CDSM outperforms simple component-wise CDSMs and performs on par with the context2vec LexSub model using the same context.
Anthology ID:
N18-2033
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Editors:
Marilyn Walker, Heng Ji, Amanda Stent
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
206–211
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/N18-2033/
DOI:
10.18653/v1/N18-2033
Bibkey:
Cite (ACL):
Maja Buljan, Sebastian Padó, and Jan Šnajder. 2018. Lexical Substitution for Evaluating Compositional Distributional Models. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 206–211, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
Lexical Substitution for Evaluating Compositional Distributional Models (Buljan et al., NAACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/N18-2033.pdf
Dataset:
 N18-2033.Datasets.zip