Evaluating language models for the retrieval and categorization of lexical collocations

Luis Espinosa Anke, Joan Codina-Filba, Leo Wanner


Abstract
Lexical collocations are idiosyncratic combinations of two syntactically bound lexical items (e.g., “heavy rain” or “take a step”). Understanding their degree of compositionality and idiosyncrasy, as well their underlying semantics, is crucial for language learners, lexicographers and downstream NLP applications. In this paper, we perform an exhaustive analysis of current language models for collocation understanding. We first construct a dataset of apparitions of lexical collocations in context, categorized into 17 representative semantic categories. Then, we perform two experiments: (1) unsupervised collocate retrieval using BERT, and (2) supervised collocation classification in context. We find that most models perform well in distinguishing light verb constructions, especially if the collocation’s first argument acts as subject, but often fail to distinguish, first, different syntactic structures within the same semantic category, and second, fine-grained semantic categories which restrict the use of small sets of valid collocates for a given base.
Anthology ID:
2021.eacl-main.120
Volume:
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Month:
April
Year:
2021
Address:
Online
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1406–1417
Language:
URL:
https://aclanthology.org/2021.eacl-main.120
DOI:
10.18653/v1/2021.eacl-main.120
Bibkey:
Cite (ACL):
Luis Espinosa Anke, Joan Codina-Filba, and Leo Wanner. 2021. Evaluating language models for the retrieval and categorization of lexical collocations. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1406–1417, Online. Association for Computational Linguistics.
Cite (Informal):
Evaluating language models for the retrieval and categorization of lexical collocations (Espinosa Anke et al., EACL 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2021.eacl-main.120.pdf
Code
 luisespinosaanke/lexicalcollocations
Data
SuperGLUE