A Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size

Masato Neishi; Jin Sakuma; Satoshi Tohda; Shonosuke Ishiwatari; Naoki Yoshinaga; Masashi Toyoda

A Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size

Masato Neishi, Jin Sakuma, Satoshi Tohda, Shonosuke Ishiwatari, Naoki Yoshinaga, Masashi Toyoda

Abstract

In this paper, we describe the team UT-IIS’s system and results for the WAT 2017 translation tasks. We further investigated several tricks including a novel technique for initializing embedding layers using only the parallel corpus, which increased the BLEU score by 1.28, found a practical large batch size of 256, and gained insights regarding hyperparameter settings. Ultimately, our system obtained a better result than the state-of-the-art system of WAT 2016. Our code is available on https://github.com/nem6ishi/wat17.

Anthology ID:: W17-5708
Volume:: Proceedings of the 4th Workshop on Asian Translation (WAT2017)
Month:: November
Year:: 2017
Address:: Taipei, Taiwan
Editors:: Toshiaki Nakazawa, Isao Goto
Venue:: WAT
SIG:
Publisher:: Asian Federation of Natural Language Processing
Note:
Pages:: 99–109
Language:
URL:: https://aclanthology.org/W17-5708
DOI:
Bibkey:
Cite (ACL):: Masato Neishi, Jin Sakuma, Satoshi Tohda, Shonosuke Ishiwatari, Naoki Yoshinaga, and Masashi Toyoda. 2017. A Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size. In Proceedings of the 4th Workshop on Asian Translation (WAT2017), pages 99–109, Taipei, Taiwan. Asian Federation of Natural Language Processing.
Cite (Informal):: A Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size (Neishi et al., WAT 2017)
Copy Citation:
PDF:: https://preview.aclanthology.org/naacl24-info/W17-5708.pdf
Code: nem6ishi/wat17
Data: ASPEC

PDF Search Code