Dense Information Flow for Neural Machine Translation

Yanyao Shen, Xu Tan, Di He, Tao Qin, Tie-Yan Liu


Abstract
Recently, neural machine translation has achieved remarkable progress by introducing well-designed deep neural networks into its encoder-decoder framework. From the optimization perspective, residual connections are adopted to improve learning performance for both encoder and decoder in most of these deep architectures, and advanced attention connections are applied as well. Inspired by the success of the DenseNet model in computer vision problems, in this paper, we propose a densely connected NMT architecture (DenseNMT) that is able to train more efficiently for NMT. The proposed DenseNMT not only allows dense connection in creating new features for both encoder and decoder, but also uses the dense attention structure to improve attention quality. Our experiments on multiple datasets show that DenseNMT structure is more competitive and efficient.
Anthology ID:
N18-1117
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1294–1303
Language:
URL:
https://aclanthology.org/N18-1117
DOI:
10.18653/v1/N18-1117
Bibkey:
Cite (ACL):
Yanyao Shen, Xu Tan, Di He, Tao Qin, and Tie-Yan Liu. 2018. Dense Information Flow for Neural Machine Translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1294–1303, New Orleans, Louisiana. Association for Computational Linguistics.
Cite (Informal):
Dense Information Flow for Neural Machine Translation (Shen et al., NAACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/N18-1117.pdf
Code
 yanyao-shen/fairseq
Data
WMT 2014