Arsenii Lukashevskyi


2025

pdf bib
Developing a Universal Dependencies Treebank for Ukrainian Parliamentary Speech
Maria Shvedova | Arsenii Lukashevskyi | Andriy Rysin
Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025)

This paper presents a new Universal Dependencies (UD) treebank based on Ukrainian parliamentary transcripts, complementing the existing UD resources for Ukrainian. The corpus includes manually annotated texts from key historical sessions of the Verkhovna Rada, capturing not only official rhetoric but also features of colloquial spoken language. The annotation combines UDPipe2 and TagText parsers, with subsequent manual correction to ensure syntactic and morphological accuracy. A detailed comparison of tagsets and the disambiguation strategy employed by TagText is provided. To demonstrate the applicability of the resource, the study examines vocative and nominative case variation in direct address using a large-scale UD-annotated corpus of parliamentary texts.

2024

pdf bib
Creating Parallel Corpora for Ukrainian: A German-Ukrainian Parallel Corpus (ParaRook||DE-UK)
Maria Shvedova | Arsenii Lukashevskyi
Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024

Parallel corpora are currently a popular and vibrantly developing category of linguistic resources, used both in literature and translation studies, as well as in the field of NLP. For Ukrainian, though, there are still not enough significant parallel corpora compiled within a single roof project and made available to the research community. In this paper we present a newly developed resource, the German-Ukrainian Parallel Corpus — ParaRook||DE-UK, searchable online. We describe various issues related to its compilation, text selection, and annotation. The paper also features several examples of how the corpus can be used in linguistic research and translation studies. Using the experience of the German-Ukrainian parallel corpus, parallel corpora for other languages with Ukrainian can be developed.