Abstract
This paper describes our submission to the WMT20 news translation shared task in English to Japanese direction. Our main approach is based on transferring knowledge of domain and linguistic characteristics by pre-training the encoder-decoder model with large amount of in-domain monolingual data through unsupervised and supervised prediction task. We then fine-tune the model with parallel data and in-domain synthetic data, generated with iterative back-translation. For additional gain, we generate final results with an ensemble model and re-rank them with averaged models and language models. Through these methods, we achieve +5.42 BLEU score compare to the baseline model.- Anthology ID:
- 2020.wmt-1.11
- Volume:
- Proceedings of the Fifth Conference on Machine Translation
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Yvette Graham, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 139–144
- Language:
- URL:
- https://aclanthology.org/2020.wmt-1.11
- DOI:
- Cite (ACL):
- Jiwan Kim, Soyoon Park, Sangha Kim, and Yoonjung Choi. 2020. An Iterative Knowledge Transfer NMT System for WMT20 News Translation Task. In Proceedings of the Fifth Conference on Machine Translation, pages 139–144, Online. Association for Computational Linguistics.
- Cite (Informal):
- An Iterative Knowledge Transfer NMT System for WMT20 News Translation Task (Kim et al., WMT 2020)
- PDF:
- https://preview.aclanthology.org/add_acl24_videos/2020.wmt-1.11.pdf