A Reinforcement Learning Approach to Improve Low-Resource Machine Translation Leveraging Domain Monolingual Data

Hongxiao Zhang; Mingtong Liu (刘明童); Chunyou Li; Yufeng Chen; Jinan Xu; Ming Zhou

A Reinforcement Learning Approach to Improve Low-Resource Machine Translation Leveraging Domain Monolingual Data

Hongxiao Zhang, Mingtong Liu, Chunyou Li, Yufeng Chen, Jinan Xu, Ming Zhou

Abstract

Due to the lack of parallel data, the mainstream fine-tuning-based domain adaptation methods have the overfitting problem in the translation of low-resource domains, and it is difficult for the model to learn the in-domain generalization knowledge. To address the above issue, in this work, we propose a novel Reinforcement Learning Domain Adaptation method for Neural Machine Translation (RLDA-NMT) in the low-resource domain. RLDA-NMT utilizes in-domain source monolingual data to make up for the lack of parallel data, and reinforces domain features learning to make the translation model learn the domain-specific knowledge more fully. Specifically, we first train a ranking-based model with a small-scale in-domain parallel corpus, and then adopt it as the reward model to select higher-quality generated translations for reinforcement when fine-tuning pre-trained NMT model using in-domain source monolingual data. We conduct experiments on Education, Laws, Thesis, and Patent domains of Chinese⇔English translation tasks. Experimental results demonstrate that RLDA-NMT can alleviate overfitting and reinforce the NMT model to learn domain-specific knowledge. Additionally, the results also show that RLDA-NMT and back-translation (BT) are nicely complementary to each other, where combining RLDA-NMT with BT can further improve translation quality.

Anthology ID:: 2024.lrec-main.132
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 1486–1497
Language:
URL:: https://aclanthology.org/2024.lrec-main.132
DOI:
Bibkey:
Cite (ACL):: Hongxiao Zhang, Mingtong Liu, Chunyou Li, Yufeng Chen, Jinan Xu, and Ming Zhou. 2024. A Reinforcement Learning Approach to Improve Low-Resource Machine Translation Leveraging Domain Monolingual Data. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 1486–1497, Torino, Italia. ELRA and ICCL.
Cite (Informal):: A Reinforcement Learning Approach to Improve Low-Resource Machine Translation Leveraging Domain Monolingual Data (Zhang et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-4/2024.lrec-main.132.pdf

PDF Search