Script Correction and Synthetic Pivoting: Adapting Tencent HY-MT for Low-Resource Turkic Translation

Bolgov Maxim

Script Correction and Synthetic Pivoting: Adapting Tencent HY-MT for Low-Resource Turkic Translation

Abstract

This paper describes a submission to the LoResMT 2026 Shared Task for the Russian-Kazakh, Russian-Bashkir, and English-Chuvash tracks. The primary approach involves parameter-efficient fine-tuning (LoRA) of the Tencent HY-MT1.5-7B multilingual model. For the Russian-Kazakh and Russian-Bashkir pairs, LoRA adaptation was employed to correct the model’s default Arabic script output to Cyrillic. For the extremely low-resource English-Chuvash pair, two strategies were compared: mixed training on authentic English-Chuvash and Russian-Chuvash data versus training exclusively on a synthetic English-Chuvash corpus created via pivoting through Russian. Baseline systems included NLLB 1.3B (distilled) for Russian-Kazakh and Russian-Bashkir, and Gemma 2 3B for English-Chuvash. Results demonstrate that adapting a strong multilingual backbone with LoRA yields significant improvements over baselines while successfully addressing script mismatch challenges. Code for training and inference is released at: https://github.com/defdet/low-resource-langs-mt-adapt

Anthology ID:: 2026.loresmt-1.20
Volume:: Proceedings for the Ninth Workshop on Technologies for Machine Translation of Low Resource Languages (LoResMT 2026)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Atul Kr. Ojha, Chao-hong Liu, Ekaterina Vylomova, Flammie Pirinen, Jonathan Washington, Nathaniel Oco, Xiaobing Zhao
Venues:: LoResMT | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 217–221
Language:
URL:: https://preview.aclanthology.org/manual-author-scripts/2026.loresmt-1.20/
DOI:
Bibkey:
Cite (ACL):: Bolgov Maxim. 2026. Script Correction and Synthetic Pivoting: Adapting Tencent HY-MT for Low-Resource Turkic Translation. In Proceedings for the Ninth Workshop on Technologies for Machine Translation of Low Resource Languages (LoResMT 2026), pages 217–221, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Script Correction and Synthetic Pivoting: Adapting Tencent HY-MT for Low-Resource Turkic Translation (Maxim, LoResMT 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/manual-author-scripts/2026.loresmt-1.20.pdf

PDF Cite Search Fix data