Abstract
The HSemID system, submitted to the CogALex VI Shared Task is a hybrid system relying mainly on metric clusters measured in large web corpora, complemented by a vector space model using cosine similarity to detect semantic associations. Although the system reached ra-ther weak results for the subcategories of synonyms, antonyms and hypernyms, with some dif-ferences from one language to another, it is able to measure general semantic associations (as being random or not-random) with an F1 score close to 0.80. The results strongly suggest that idiomatic constructions play a fundamental role in semantic associations. Further experiments are necessary in order to fine-tune the model to the subcategories of synonyms, antonyms, hy-pernyms and to explain surprising differences across languages. 1 Introduction- Anthology ID:
- 2020.cogalex-1.6
- Volume:
- Proceedings of the Workshop on the Cognitive Aspects of the Lexicon
- Month:
- December
- Year:
- 2020
- Address:
- Online
- Editors:
- Michael Zock, Emmanuele Chersoni, Alessandro Lenci, Enrico Santus
- Venue:
- CogALex
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 54–58
- Language:
- URL:
- https://aclanthology.org/2020.cogalex-1.6
- DOI:
- Cite (ACL):
- Jean-Pierre Colson. 2020. Extracting meaning by idiomaticity: Description of the HSemID system at CogALex VI (2020). In Proceedings of the Workshop on the Cognitive Aspects of the Lexicon, pages 54–58, Online. Association for Computational Linguistics.
- Cite (Informal):
- Extracting meaning by idiomaticity: Description of the HSemID system at CogALex VI (2020) (Colson, CogALex 2020)
- PDF:
- https://preview.aclanthology.org/rocling-reingestion-23/2020.cogalex-1.6.pdf