VMWE discovery: a comparative analysis between Literature and Twitter Corpora

Vivian Stamou, Artemis Xylogianni, Marilena Malli, Penny Takorou, Stella Markantonatou


Abstract
We evaluate manually five lexical association measurements as regards the discovery of Modern Greek verb multiword expressions with two or more lexicalised components usingmwetoolkit3 (Ramisch et al., 2010). We use Twitter corpora and compare our findings with previous work on fiction corpora. The results of LL, MLE and T-score were found to overlap significantly in both the fiction and the Twitter corpora, while the results of PMI and Dice do not.We find that MWEs with two lexicalised components are more frequent in Twitter than in fiction corpora and that lean syntactic patterns help retrieve them more efficiently than richer ones.Our work (i) supports the enrichment of the lexicographical database for Modern Greek MWEs’ IDION’ (Markantonatou et al., 2019) and (ii) highlights aspects of the usage of five association measurements on specific text genres for best MWE discovery results.
Anthology ID:
2020.mwe-1.8
Volume:
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
Month:
December
Year:
2020
Address:
online
Venue:
MWE
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
66–72
Language:
URL:
https://aclanthology.org/2020.mwe-1.8
DOI:
Bibkey:
Cite (ACL):
Vivian Stamou, Artemis Xylogianni, Marilena Malli, Penny Takorou, and Stella Markantonatou. 2020. VMWE discovery: a comparative analysis between Literature and Twitter Corpora. In Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, pages 66–72, online. Association for Computational Linguistics.
Cite (Informal):
VMWE discovery: a comparative analysis between Literature and Twitter Corpora (Stamou et al., MWE 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nodalida-main-page/2020.mwe-1.8.pdf