Feature Decay Algorithms for Neural Machine Translation
Alberto Poncelas, Gideon Maillette de Buy Wenniger, Andy Way
Abstract
Neural Machine Translation (NMT) systems require a lot of data to be competitive. For this reason, data selection techniques are used only for finetuning systems that have been trained with larger amounts of data. In this work we aim to use Feature Decay Algorithms (FDA) data selection techniques not only to fine-tune a system but also to build a complete system with less data. Our findings reveal that it is possible to find a subset of sentence pairs, that outperforms by 1.11 BLEU points the full training corpus, when used for training a German-English NMT system .- Anthology ID:
- 2018.eamt-main.24
- Volume:
- Proceedings of the 21st Annual Conference of the European Association for Machine Translation
- Month:
- May
- Year:
- 2018
- Address:
- Alicante, Spain
- Editors:
- Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Miquel Esplà-Gomis, Maja Popović, Celia Rico, André Martins, Joachim Van den Bogaert, Mikel L. Forcada
- Venue:
- EAMT
- SIG:
- Publisher:
- Note:
- Pages:
- 259–268
- Language:
- URL:
- https://aclanthology.org/2018.eamt-main.24
- DOI:
- Cite (ACL):
- Alberto Poncelas, Gideon Maillette de Buy Wenniger, and Andy Way. 2018. Feature Decay Algorithms for Neural Machine Translation. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation, pages 259–268, Alicante, Spain.
- Cite (Informal):
- Feature Decay Algorithms for Neural Machine Translation (Poncelas et al., EAMT 2018)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2018.eamt-main.24.pdf