Maria Shvedova
2026
Semantic Fidelity Versus Literary Quality: A Construct Validity Study of Neural Machine Translation Metrics
Dmytro Chaplynskyi | Ivan Kulynych | Maria Shvedova | Lesia Ivashkevych
Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026)
Dmytro Chaplynskyi | Ivan Kulynych | Maria Shvedova | Lesia Ivashkevych
Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026)
Automatic machine translation metrics are the de facto standard for evaluating translation quality. Yet, it remains unclear what they actually measure. We investigate this question using a unique multilingual corpus: seven human Ukrainian translations of George Orwell’s Animal Farm, alongside three architecturally distinct AI systems (GPT-5.2, DeepL, and Lapa, a Ukrainian-tuned LLM). Across seven neural metrics, four reference-free and three reference-based, all three AI translations rank at the top. However, stylometric analysis exposes that these same AI translations are not as lexically rich as human ones ($-$18% MTLD), underuse Ukrainian particles (up to 2x fewer) and diminutive morphology (2.6x fewer), and converge on near-identical outputs (LaBSE pairwise similarity 0.941 vs. 0.711 for human pairs). A controlled LLM-as-a-judge experiment demonstrates a clear preference reversal: when the English source is visible, AI ranks first; when it is hidden and the judge evaluates literary quality alone, humans rise to the top and AI falls to the lower ranks. Human evaluation (1,034 pairwise judgments) is balanced across both patterns. We argue that current MT metrics reward semantic fidelity and surface fluency — properties optimized by AI systems — while failing to capture the lexical richness, cultural adaptation, and stylistic voice that characterize skilled literary translation.
Ukrainian Multiword Expressions Corpus: Creation, Annotation, and Linguistic Analysis
Hanna Sytar | Maria Shvedova | Olha Kanishcheva
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
Hanna Sytar | Maria Shvedova | Olha Kanishcheva
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
This paper presents the development of a corpus of annotated multiword expressions (MWEs) for Ukrainian. The resource covers four major categories of MWEs: verbal, nominal, adjectival/adverbial, and functional. We describe the methodology used for data selection, the annotation scheme, and the procedures employed during annotation. In addition, the paper discusses some specific types of MWE constructions, illustrating their usage with numerous examples and addressing complex and borderline cases. The resulting corpus is an important resource for linguistic studies and NLP tasks involving MWEs, and is publicly accessible https://gitlab.com/parseme/sharedtask-data/-/tree/master/2.0?ref_type=heads.
2025
Developing a Universal Dependencies Treebank for Ukrainian Parliamentary Speech
Maria Shvedova | Arsenii Lukashevskyi | Andriy Rysin
Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025)
Maria Shvedova | Arsenii Lukashevskyi | Andriy Rysin
Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025)
This paper presents a new Universal Dependencies (UD) treebank based on Ukrainian parliamentary transcripts, complementing the existing UD resources for Ukrainian. The corpus includes manually annotated texts from key historical sessions of the Verkhovna Rada, capturing not only official rhetoric but also features of colloquial spoken language. The annotation combines UDPipe2 and TagText parsers, with subsequent manual correction to ensure syntactic and morphological accuracy. A detailed comparison of tagsets and the disambiguation strategy employed by TagText is provided. To demonstrate the applicability of the resource, the study examines vocative and nominative case variation in direct address using a large-scale UD-annotated corpus of parliamentary texts.
2024
Creating Parallel Corpora for Ukrainian: A German-Ukrainian Parallel Corpus (ParaRook||DE-UK)
Maria Shvedova | Arsenii Lukashevskyi
Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024
Maria Shvedova | Arsenii Lukashevskyi
Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024
Parallel corpora are currently a popular and vibrantly developing category of linguistic resources, used both in literature and translation studies, as well as in the field of NLP. For Ukrainian, though, there are still not enough significant parallel corpora compiled within a single roof project and made available to the research community. In this paper we present a newly developed resource, the German-Ukrainian Parallel Corpus — ParaRook||DE-UK, searchable online. We describe various issues related to its compilation, text selection, and annotation. The paper also features several examples of how the corpus can be used in linguistic research and translation studies. Using the experience of the German-Ukrainian parallel corpus, parallel corpora for other languages with Ukrainian can be developed.
2023
The Parliamentary Code-Switching Corpus: Bilingualism in the Ukrainian Parliament in the 1990s-2020s
Olha Kanishcheva | Maria Shvedova | Tetiana Kovalova | Ruprecht von Waldenfels
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)
Olha Kanishcheva | Maria Shvedova | Tetiana Kovalova | Ruprecht von Waldenfels
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)
We describe a Ukrainian-Russian code-switching corpus of Ukrainian Parliamentary Session Transcripts. The corpus includes speeches entirely in Ukrainian, Russian, or various types of mixed speech and allows us to see how speakers switch between these languages depending on the communicative situation. The paper describes the process of creating this corpus from the official multilingual transcripts using automatic language detecting and publicly available metadata on the speakers. On this basis, we consider possible reasons for the change in the number of Ukrainian speakers in the parliament and present the most common patterns of bilingual Ukrainian and Russian code-switching in parliamentarians’ speeches.