Aleksandra Markovic

Also published as: Aleksandra Marković


2025

Lexica of MWEs have always been a valuable resource for various NLP tasks. This paper presents the results of a comprehensive survey on multiword lexical resources that extends a previous one from 2016 to the present. We analyze a diverse set of lexica across multiple languages, reporting on aspects such as creation date, intended usage, languages covered and linguality type, content, acquisition method, accessibility, and linkage to other language resources. Our findings highlight trends in MWE lexicon development focusing on the representation level of languages. This survey aims to support future efforts in creating MWE lexica for NLP applications by identifying these gaps and opportunities.
This paper deals with light verb constructions and their annotation in ELEXIS-sr, the Serbian extension of the ELEXIS-WSD corpus. In Section 1, general introductory remarks are given about these constructions, the notion of light verbs, and their treatment and further classification in the PARSEME annotation guidelines (subtypes LVC.full and LVC.cause). Section 2 offers an insight into ELEXIS-WSD corpus, annotated with VMWEs for several languages, with a remark that these VMWEs were not further subcategorised into finer classes. For this paper, we classified them ourselves to facilitate comparisons of the LVCs annotated in ELEXIS-sr. Tools and resources used for the automatic annotation of ELEXIS-sr are presented in Section 3, as well as the results of manual checking. In Section 4, we offer a comparison of LVCs in four ELEXIS-WSD sub-collections: Serbian, Bulgarian, Slovene, and English. We use Serbian as a starting point for this comparison, as it has been thoroughly annotated with MWEs (and NEs). We present the results of the comparison of all the occurrences of LVCs in the Serbian extension with their occurrences and annotation both in ELEXIS-WSD and Parseme sub-corpora for other languages. An important conclusion is that the most equivalents among LVCs are between Serbian and Bulgarian, closely related Slavic languages (a total of 34 equivalents), while between Serbian and Slovene, also Slavic, there are 11 equivalents, as between Serbian and English. It seems that this could be explained by the number of VMWES and LVCs annotated, or by the strategy used by different annotators.