Developing a Universal Dependencies Treebank for Ukrainian Parliamentary Speech

Maria Shvedova, Arsenii Lukashevskyi, Andriy Rysin


Abstract
This paper presents a new Universal Dependencies (UD) treebank based on Ukrainian parliamentary transcripts, complementing the existing UD resources for Ukrainian. The corpus includes manually annotated texts from key historical sessions of the Verkhovna Rada, capturing not only official rhetoric but also features of colloquial spoken language. The annotation combines UDPipe2 and TagText parsers, with subsequent manual correction to ensure syntactic and morphological accuracy. A detailed comparison of tagsets and the disambiguation strategy employed by TagText is provided. To demonstrate the applicability of the resource, the study examines vocative and nominative case variation in direct address using a large-scale UD-annotated corpus of parliamentary texts.
Anthology ID:
2025.unlp-1.7
Volume:
Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria (online)
Editor:
Mariana Romanyshyn
Venues:
UNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
55–63
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.unlp-1.7/
DOI:
Bibkey:
Cite (ACL):
Maria Shvedova, Arsenii Lukashevskyi, and Andriy Rysin. 2025. Developing a Universal Dependencies Treebank for Ukrainian Parliamentary Speech. In Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025), pages 55–63, Vienna, Austria (online). Association for Computational Linguistics.
Cite (Informal):
Developing a Universal Dependencies Treebank for Ukrainian Parliamentary Speech (Shvedova et al., UNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.unlp-1.7.pdf