Shoki Hamada
2025
AkibaNLP-TUT: Injecting Language-Specific Word-Level Noise for Low-Resource Language Translation
Shoki Hamada
|
Tomoyosi Akiba
|
Hajime Tsukada
Proceedings of the Tenth Conference on Machine Translation
In this paper, we describes our system for the WMT 2025 Low-Resource Indic Language Translation Shared Task.The language directions addressed are Assamese↔English and Manipuri→English.We propose a method to improve translation performance from low-resource languages (LRLs) to English by injecting Language-specific word-level noise into the parallel corpus of a closely related high-resource language (HRL).In the proposed method, word replacements are performed based on edit distance, using vocabulary and frequency information extracted from an LRL monolingual corpus.Experiments conducted on Assamese and Manipuri show that, in the absence of LRL parallel data, the proposed method outperforms both the w/o noise setting and existing approaches. Furthermore, we confirmed that increasing the size of the monolingual corpus used for noise injection leads to improved translation performance.