Spivavtor: An Instruction Tuned Ukrainian Text Editing Model

Aman Saini, Artem Chernodub, Vipul Raheja, Vivek Kulkarni


Abstract
We introduce Spivavtor, a dataset, and instruction-tuned models for text editing focused on the Ukrainian language. Spivavtor is the Ukrainian-focused adaptation of the English-only CoEdIT (Raheja et al., 2023) model. Similar to CoEdIT, Spivavtor performs text editing tasks by following instructions in Ukrainian like “Виправте граматику в цьому реченнi” and “Спростiть це речення” which translate to “Correct the grammar in this sentence” and “Simplify this sentence” in English, respectively. This paper describes the details of the Spivavtor-Instruct dataset and Spivavtor models. We evaluate Spivavtor on a variety of text editing tasks in Ukrainian, such as Grammatical Error Correction (GEC), Text Simplification, Coherence, and Paraphrasing, and demonstrate its superior performance on all of them. We publicly release our best performing models and data as resources to the community to advance further research in this space.
Anthology ID:
2024.unlp-1.12
Volume:
Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Mariana Romanyshyn, Nataliia Romanyshyn, Andrii Hlybovets, Oleksii Ignatenko
Venue:
UNLP
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
95–108
Language:
URL:
https://aclanthology.org/2024.unlp-1.12
DOI:
Bibkey:
Cite (ACL):
Aman Saini, Artem Chernodub, Vipul Raheja, and Vivek Kulkarni. 2024. Spivavtor: An Instruction Tuned Ukrainian Text Editing Model. In Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024, pages 95–108, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Spivavtor: An Instruction Tuned Ukrainian Text Editing Model (Saini et al., UNLP 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-3/2024.unlp-1.12.pdf