Abstract
WordNet is a state-of-the-art lexical resource used in many tasks in Natural Language Processing, also in multi-word expression (MWE) recognition. However, not all MWEs recorded in WordNet could be indisputably called lexicalised. Some of them are semantically compositional and show no signs of idiosyncrasy. This state of affairs affects all evaluation measures that use the list of all WordNet MWEs as a gold standard. We propose a method of distinguishing between lexicalised and non-lexicalised word combinations in WordNet, taking into account lexicality features, such as semantic compositionality, MWE length and translational criterion. Both a rule-based approach and a ridge logistic regression are applied, beating a random baseline in precision of singling out lexicalised MWEs, as well as in recall of ruling out cases of non-lexicalised MWEs.- Anthology ID:
- 2022.mwe-1.8
- Volume:
- Proceedings of the 18th Workshop on Multiword Expressions @LREC2022
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Archna Bhatia, Paul Cook, Shiva Taslimipoor, Marcos Garcia, Carlos Ramisch
- Venue:
- MWE
- SIG:
- SIGLEX
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 49–54
- Language:
- URL:
- https://aclanthology.org/2022.mwe-1.8
- DOI:
- Cite (ACL):
- Marek Maziarz, Ewa Rudnicka, and Łukasz Grabowski. 2022. Multi-word Lexical Units Recognition in WordNet. In Proceedings of the 18th Workshop on Multiword Expressions @LREC2022, pages 49–54, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Multi-word Lexical Units Recognition in WordNet (Maziarz et al., MWE 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2022.mwe-1.8.pdf