Lexicalised and non-lexicalized multi-word expressions in WordNet: a cross-encoder approach

Marek Maziarz, Łukasz Grabowski, Tadeusz Piotrowski, Ewa Rudnicka, Maciej Piasecki


Abstract
Focusing on recognition of multi-word expressions (MWEs), we address the problem of recording MWEs in WordNet. In fact, not all MWEs recorded in that lexical database could with no doubt be considered as lexicalised (e.g. elements of wordnet taxonomy, quantifier phrases, certain collocations). In this paper, we use a cross-encoder approach to improve our earlier method of distinguishing between lexicalised and non-lexicalised MWEs found in WordNet using custom-designed rule-based and statistical approaches. We achieve F1-measure for the class of lexicalised word combinations close to 80%, easily beating two baselines (random and a majority class one). Language model also proves to be better than a feature-based logistic regression model.
Anthology ID:
2023.gwc-1.28
Volume:
Proceedings of the 12th Global Wordnet Conference
Month:
January
Year:
2023
Address:
University of the Basque Country, Donostia - San Sebastian, Basque Country
Editors:
German Rigau, Francis Bond, Alexandre Rademaker
Venue:
GWC
SIG:
Publisher:
Global Wordnet Association
Note:
Pages:
228–234
Language:
URL:
https://aclanthology.org/2023.gwc-1.28
DOI:
Bibkey:
Cite (ACL):
Marek Maziarz, Łukasz Grabowski, Tadeusz Piotrowski, Ewa Rudnicka, and Maciej Piasecki. 2023. Lexicalised and non-lexicalized multi-word expressions in WordNet: a cross-encoder approach. In Proceedings of the 12th Global Wordnet Conference, pages 228–234, University of the Basque Country, Donostia - San Sebastian, Basque Country. Global Wordnet Association.
Cite (Informal):
Lexicalised and non-lexicalized multi-word expressions in WordNet: a cross-encoder approach (Maziarz et al., GWC 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/2023.gwc-1.28.pdf