Polish corpus of verbal multiword expressions

Agata Savary, Jakub Waszczuk


Abstract
This paper describes a manually annotated corpus of verbal multi-word expressions in Polish. It is among the 4 biggest datasets in release 1.2 of the PARSEME multiligual corpus. We describe the data sources, as well as the annotation process and its outcomes. We also present interesting phenomena encountered during the annotation task and put forward enhancements for the PARSEME annotation guidelines.
Anthology ID:
2020.mwe-1.5
Volume:
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
Month:
December
Year:
2020
Address:
online
Venue:
MWE
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
32–43
Language:
URL:
https://aclanthology.org/2020.mwe-1.5
DOI:
Bibkey:
Cite (ACL):
Agata Savary and Jakub Waszczuk. 2020. Polish corpus of verbal multiword expressions. In Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, pages 32–43, online. Association for Computational Linguistics.
Cite (Informal):
Polish corpus of verbal multiword expressions (Savary & Waszczuk, MWE 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/remove-xml-comments/2020.mwe-1.5.pdf
Code
 parseme/parseme_corpus_pl