Filtering Back-Translated Data in Unsupervised Neural Machine Translation

Jyotsana Khatri; Pushpak Bhattacharyya

doi:10.18653/v1/2020.coling-main.383

Filtering Back-Translated Data in Unsupervised Neural Machine Translation

Abstract

Unsupervised neural machine translation (NMT) utilizes only monolingual data for training. The quality of back-translated data plays an important role in the performance of NMT systems. In back-translation, all generated pseudo parallel sentence pairs are not of the same quality. Taking inspiration from domain adaptation where in-domain sentences are given more weight in training, in this paper we propose an approach to filter back-translated data as part of the training process of unsupervised NMT. Our approach gives more weight to good pseudo parallel sentence pairs in the back-translation phase. We calculate the weight of each pseudo parallel sentence pair using sentence-wise round-trip BLEU score which is normalized batch-wise. We compare our approach with the current state of the art approaches for unsupervised NMT.

Anthology ID:: 2020.coling-main.383
Volume:: Proceedings of the 28th International Conference on Computational Linguistics
Month:: December
Year:: 2020
Address:: Barcelona, Spain (Online)
Editors:: Donia Scott, Nuria Bel, Chengqing Zong
Venue:: COLING
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 4334–4339
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2020.coling-main.383/
DOI:: 10.18653/v1/2020.coling-main.383
Bibkey:
Cite (ACL):: Jyotsana Khatri and Pushpak Bhattacharyya. 2020. Filtering Back-Translated Data in Unsupervised Neural Machine Translation. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4334–4339, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):: Filtering Back-Translated Data in Unsupervised Neural Machine Translation (Khatri & Bhattacharyya, COLING 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2020.coling-main.383.pdf

PDF Cite Search Fix data