Massive Exploration of Neural Machine Translation Architectures

Denny Britz, Anna Goldie, Minh-Thang Luong, Quoc Le


Abstract
Neural Machine Translation (NMT) has shown remarkable progress over the past few years, with production systems now being deployed to end-users. As the field is moving rapidly, it has become unclear which elements of NMT architectures have a significant impact on translation quality. In this work, we present a large-scale analysis of the sensitivity of NMT architectures to common hyperparameters. We report empirical results and variance numbers for several hundred experimental runs, corresponding to over 250,000 GPU hours on a WMT English to German translation task. Our experiments provide practical insights into the relative importance of factors such as embedding size, network depth, RNN cell type, residual connections, attention mechanism, and decoding heuristics. As part of this contribution, we also release an open-source NMT framework in TensorFlow to make it easy for others to reproduce our results and perform their own experiments.
Anthology ID:
D17-1151
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Martha Palmer, Rebecca Hwa, Sebastian Riedel
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
1442–1451
Language:
URL:
https://aclanthology.org/D17-1151
DOI:
10.18653/v1/D17-1151
Bibkey:
Cite (ACL):
Denny Britz, Anna Goldie, Minh-Thang Luong, and Quoc Le. 2017. Massive Exploration of Neural Machine Translation Architectures. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1442–1451, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Massive Exploration of Neural Machine Translation Architectures (Britz et al., EMNLP 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/D17-1151.pdf
Code
 google/seq2seq +  additional community code