Improving NMT via Filtered Back Translation
Nikhil Jaiswal, Mayur Patidar, Surabhi Kumari, Manasi Patwardhan, Shirish Karande, Puneet Agarwal, Lovekesh Vig
Abstract
Document-Level Machine Translation (MT) has become an active research area among the NLP community in recent years. Unlike sentence-level MT, which translates the sentences independently, document-level MT aims to utilize contextual information while translating a given source sentence. This paper demonstrates our submission (Team ID - DEEPNLP) to the Document-Level Translation task organized by WAT 2020. This task focuses on translating texts from a business dialog corpus while optionally utilizing the context present in the dialog. In our proposed approach, we utilize publicly available parallel corpus from different domains to train an open domain base NMT model. We then use monolingual target data to create filtered pseudo parallel data and employ Back-Translation to fine-tune the base model. This is further followed by fine-tuning on the domain-specific corpus. We also ensemble various models to improvise the translation performance. Our best models achieve a BLEU score of 26.59 and 22.83 in an unconstrained setting and 15.10 and 10.91 in the constrained settings for En->Ja & Ja->En direction, respectively.- Anthology ID:
- 2020.wat-1.19
- Volume:
- Proceedings of the 7th Workshop on Asian Translation
- Month:
- December
- Year:
- 2020
- Address:
- Suzhou, China
- Editors:
- Toshiaki Nakazawa, Hideki Nakayama, Chenchen Ding, Raj Dabre, Anoop Kunchukuttan, Win Pa Pa, Ondřej Bojar, Shantipriya Parida, Isao Goto, Hidaya Mino, Hiroshi Manabe, Katsuhito Sudoh, Sadao Kurohashi, Pushpak Bhattacharyya
- Venue:
- WAT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 154–159
- Language:
- URL:
- https://aclanthology.org/2020.wat-1.19
- DOI:
- Cite (ACL):
- Nikhil Jaiswal, Mayur Patidar, Surabhi Kumari, Manasi Patwardhan, Shirish Karande, Puneet Agarwal, and Lovekesh Vig. 2020. Improving NMT via Filtered Back Translation. In Proceedings of the 7th Workshop on Asian Translation, pages 154–159, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- Improving NMT via Filtered Back Translation (Jaiswal et al., WAT 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2020.wat-1.19.pdf
- Data
- JESC, MTNT, WikiMatrix