Neural vs. Phrase-Based Machine Translation in a Multi-Domain Scenario

M. Amin Farajian, Marco Turchi, Matteo Negri, Nicola Bertoldi, Marcello Federico


Abstract
State-of-the-art neural machine translation (NMT) systems are generally trained on specific domains by carefully selecting the training sets and applying proper domain adaptation techniques. In this paper we consider the real world scenario in which the target domain is not predefined, hence the system should be able to translate text from multiple domains. We compare the performance of a generic NMT system and phrase-based statistical machine translation (PBMT) system by training them on a generic parallel corpus composed of data from different domains. Our results on multi-domain English-French data show that, in these realistic conditions, PBMT outperforms its neural counterpart. This raises the question: is NMT ready for deployment as a generic/multi-purpose MT backbone in real-world settings?
Anthology ID:
E17-2045
Volume:
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Mirella Lapata, Phil Blunsom, Alexander Koller
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
280–284
Language:
URL:
https://aclanthology.org/E17-2045
DOI:
Bibkey:
Cite (ACL):
M. Amin Farajian, Marco Turchi, Matteo Negri, Nicola Bertoldi, and Marcello Federico. 2017. Neural vs. Phrase-Based Machine Translation in a Multi-Domain Scenario. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 280–284, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Neural vs. Phrase-Based Machine Translation in a Multi-Domain Scenario (Farajian et al., EACL 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/E17-2045.pdf