Low-resource Buryat-Russian neural machine translation

Dari Baturova, Sarana Abidueva, Dmitrii Lichko, Ivan Bondarenko


Abstract
This paper presents a study on the development of a neural machine translation (NMT) system for the Russian-Buryat language pair, focusing on addressing the challenges of low-resource translation.We also present a parallel corpus, constructed by processing existing texts and organizing the translation process, supplemented by data augmentation techniques to enhance model training. We managed to achieve BLEU score of 20 and 35 for translation to Buryat andRussian respectively. Native speakers have evaluated the translations as acceptable.Future directions include expanding and cleaning the dataset, improving model training techniques, and exploring dialectal variations within the Buryat language.
Anthology ID:
2025.fieldmatters-1.8
Volume:
Proceedings of the Fourth Workshop on NLP Applications to Field Linguistics
Month:
August
Year:
2025
Address:
Vienna, Austria
Editors:
Éric Le Ferrand, Elena Klyachko, Anna Postnikova, Tatiana Shavrina, Oleg Serikov, Ekaterina Voloshina, Ekaterina Vylomova
Venues:
FieldMatters | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
85–93
Language:
URL:
https://preview.aclanthology.org/transition-to-people-yaml/2025.fieldmatters-1.8/
DOI:
Bibkey:
Cite (ACL):
Dari Baturova, Sarana Abidueva, Dmitrii Lichko, and Ivan Bondarenko. 2025. Low-resource Buryat-Russian neural machine translation. In Proceedings of the Fourth Workshop on NLP Applications to Field Linguistics, pages 85–93, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Low-resource Buryat-Russian neural machine translation (Baturova et al., FieldMatters 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/transition-to-people-yaml/2025.fieldmatters-1.8.pdf