A Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size
Masato Neishi, Jin Sakuma, Satoshi Tohda, Shonosuke Ishiwatari, Naoki Yoshinaga, Masashi Toyoda
Abstract
In this paper, we describe the team UT-IIS’s system and results for the WAT 2017 translation tasks. We further investigated several tricks including a novel technique for initializing embedding layers using only the parallel corpus, which increased the BLEU score by 1.28, found a practical large batch size of 256, and gained insights regarding hyperparameter settings. Ultimately, our system obtained a better result than the state-of-the-art system of WAT 2016. Our code is available on https://github.com/nem6ishi/wat17.- Anthology ID:
- W17-5708
- Volume:
- Proceedings of the 4th Workshop on Asian Translation (WAT2017)
- Month:
- November
- Year:
- 2017
- Address:
- Taipei, Taiwan
- Editors:
- Toshiaki Nakazawa, Isao Goto
- Venue:
- WAT
- SIG:
- Publisher:
- Asian Federation of Natural Language Processing
- Note:
- Pages:
- 99–109
- Language:
- URL:
- https://aclanthology.org/W17-5708
- DOI:
- Cite (ACL):
- Masato Neishi, Jin Sakuma, Satoshi Tohda, Shonosuke Ishiwatari, Naoki Yoshinaga, and Masashi Toyoda. 2017. A Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size. In Proceedings of the 4th Workshop on Asian Translation (WAT2017), pages 99–109, Taipei, Taiwan. Asian Federation of Natural Language Processing.
- Cite (Informal):
- A Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size (Neishi et al., WAT 2017)
- PDF:
- https://preview.aclanthology.org/naacl24-info/W17-5708.pdf
- Code
- nem6ishi/wat17
- Data
- ASPEC