Abstract
The prevalent approach to neural machine translation relies on bi-directional LSTMs to encode the source sentence. We present a faster and simpler architecture based on a succession of convolutional layers. This allows to encode the source sentence simultaneously compared to recurrent networks for which computation is constrained by temporal dependencies. On WMT’16 English-Romanian translation we achieve competitive accuracy to the state-of-the-art and on WMT’15 English-German we outperform several recently published results. Our models obtain almost the same accuracy as a very deep LSTM setup on WMT’14 English-French translation. We speed up CPU decoding by more than two times at the same or higher accuracy as a strong bi-directional LSTM.- Anthology ID:
- P17-1012
- Volume:
- Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2017
- Address:
- Vancouver, Canada
- Editors:
- Regina Barzilay, Min-Yen Kan
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 123–135
- Language:
- URL:
- https://aclanthology.org/P17-1012
- DOI:
- 10.18653/v1/P17-1012
- Cite (ACL):
- Jonas Gehring, Michael Auli, David Grangier, and Yann Dauphin. 2017. A Convolutional Encoder Model for Neural Machine Translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 123–135, Vancouver, Canada. Association for Computational Linguistics.
- Cite (Informal):
- A Convolutional Encoder Model for Neural Machine Translation (Gehring et al., ACL 2017)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/P17-1012.pdf
- Code
- additional community code
- Data
- WMT 2014, WMT 2016, WMT 2016 News