Ukrainian Multiword Expressions Corpus: Creation, Annotation, and Linguistic Analysis

Hanna Sytar, Maria Shvedova, Olha Kanishcheva


Abstract
This paper presents the development of a corpus of annotated multiword expressions (MWEs) for Ukrainian. The resource covers four major categories of MWEs: verbal, nominal, adjectival/adverbial, and functional. We describe the methodology used for data selection, the annotation scheme, and the procedures employed during annotation. In addition, the paper discusses some specific types of MWE constructions, illustrating their usage with numerous examples and addressing complex and borderline cases. The resulting corpus is an important resource for linguistic studies and NLP tasks involving MWEs, and is publicly accessible https://gitlab.com/parseme/sharedtask-data/-/tree/master/2.0?ref_type=heads.
Anthology ID:
2026.mwe-1.4
Volume:
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
Month:
March
Year:
2026
Address:
Rabat, Marocco
Editors:
Atul Kr. Ojha, Verginica Barbu Mititelu, Mathieu Constant, Ivelina Stoyanova, A. Seza Doğruöz, Alexandre Rademaker
Venues:
MWE | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
38–47
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.mwe-1.4/
DOI:
Bibkey:
Cite (ACL):
Hanna Sytar, Maria Shvedova, and Olha Kanishcheva. 2026. Ukrainian Multiword Expressions Corpus: Creation, Annotation, and Linguistic Analysis. In Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026), pages 38–47, Rabat, Marocco. Association for Computational Linguistics.
Cite (Informal):
Ukrainian Multiword Expressions Corpus: Creation, Annotation, and Linguistic Analysis (Sytar et al., MWE 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.mwe-1.4.pdf