Hanna Sytar
2026
Ukrainian Multiword Expressions Corpus: Creation, Annotation, and Linguistic Analysis
Hanna Sytar | Maria Shvedova | Olha Kanishcheva
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
Hanna Sytar | Maria Shvedova | Olha Kanishcheva
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
This paper presents the development of a corpus of annotated multiword expressions (MWEs) for Ukrainian. The resource covers four major categories of MWEs: verbal, nominal, adjectival/adverbial, and functional. We describe the methodology used for data selection, the annotation scheme, and the procedures employed during annotation. In addition, the paper discusses some specific types of MWE constructions, illustrating their usage with numerous examples and addressing complex and borderline cases. The resulting corpus is an important resource for linguistic studies and NLP tasks involving MWEs, and is publicly accessible https://gitlab.com/parseme/sharedtask-data/-/tree/master/2.0?ref_type=heads.