Exploiting Target Language Data for Neural Machine Translation Beyond Back Translation
Abudurexiti Reheman, Yingfeng Luo, Junhao Ruan, Chunliang Zhang, Anxiang Ma, Tong Xiao, JingBo Zhu
Abstract
Neural Machine Translation (NMT) encounters challenges when translating in new domains and low-resource languages. To address these issues, researchers have proposed methods to integrate additional knowledge into NMT, such as translation memories (TMs). However, finding TMs that closely match the input sentence remains challenging, particularly in specific domains. On the other hand, monolingual data is widely accessible in most languages, and back-translation is seen as a promising approach for utilizing target language data. Nevertheless, it still necessitates additional training. In this paper, we introduce Pseudo-kNN-MT, a variant of k-nearest neighbor machine translation (kNN-MT) that utilizes target language data by constructing a pseudo datastore. Furthermore, we investigate the utility of large language models (LLMs) for the kNN component. Experimental results demonstrate that our approach exhibits strong domain adaptation capability in both high-resource and low-resource machine translation. Notably, LLMs are found to be beneficial for robust NMT systems.- Anthology ID:
- 2024.findings-acl.727
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2024
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand
- Editors:
- Lun-Wei Ku, Andre Martins, Vivek Srikumar
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 12216–12228
- Language:
- URL:
- https://aclanthology.org/2024.findings-acl.727
- DOI:
- 10.18653/v1/2024.findings-acl.727
- Cite (ACL):
- Abudurexiti Reheman, Yingfeng Luo, Junhao Ruan, Chunliang Zhang, Anxiang Ma, Tong Xiao, and JingBo Zhu. 2024. Exploiting Target Language Data for Neural Machine Translation Beyond Back Translation. In Findings of the Association for Computational Linguistics: ACL 2024, pages 12216–12228, Bangkok, Thailand. Association for Computational Linguistics.
- Cite (Informal):
- Exploiting Target Language Data for Neural Machine Translation Beyond Back Translation (Reheman et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/add_acl24_videos/2024.findings-acl.727.pdf