Low-resource Machine Translation for Code-switched Kazakh-Russian Language Pair

Maksim Borisov, Zhanibek Kozhirbayev, Valentin Malykh


Abstract
Machine translation for low-resource language pairs is a challenging task. This task could become extremely difficult once a speaker uses code switching. We present the first code-switching Kazakh-Russian parallel corpus.Additionally, we propose a method to build a machine translation model for code-switched Kazakh-Russian language pair with no labeled data. Our method is basing on generation of synthetic data. This method results in a model beating an existing commercial system by human evaluation.
Anthology ID:
2025.naacl-srw.7
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
Month:
April
Year:
2025
Address:
Albuquerque, USA
Editors:
Abteen Ebrahimi, Samar Haider, Emmy Liu, Sammar Haider, Maria Leonor Pacheco, Shira Wein
Venues:
NAACL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
66–76
Language:
URL:
https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.naacl-srw.7/
DOI:
Bibkey:
Cite (ACL):
Maksim Borisov, Zhanibek Kozhirbayev, and Valentin Malykh. 2025. Low-resource Machine Translation for Code-switched Kazakh-Russian Language Pair. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 66–76, Albuquerque, USA. Association for Computational Linguistics.
Cite (Informal):
Low-resource Machine Translation for Code-switched Kazakh-Russian Language Pair (Borisov et al., NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.naacl-srw.7.pdf