Generic and Specialized Word Embeddings for Multi-Domain Machine Translation

MinhQuang Pham, Josep Crego, François Yvon, Jean Senellart


Abstract
Supervised machine translation works well when the train and test data are sampled from the same distribution. When this is not the case, adaptation techniques help ensure that the knowledge learned from out-of-domain texts generalises to in-domain sentences. We study here a related setting, multi-domain adaptation, where the number of domains is potentially large and adapting separately to each domain would waste training resources. Our proposal transposes to neural machine translation the feature expansion technique of (Daumé III, 2007): it isolates domain-agnostic from domain-specific lexical representations, while sharing the most of the network across domains. Our experiments use two architectures and two language pairs: they show that our approach, while simple and computationally inexpensive, outperforms several strong baselines and delivers a multi-domain system that successfully translates texts from diverse sources.
Anthology ID:
2019.iwslt-1.26
Volume:
Proceedings of the 16th International Conference on Spoken Language Translation
Month:
November 2-3
Year:
2019
Address:
Hong Kong
Editors:
Jan Niehues, Rolando Cattoni, Sebastian Stüker, Matteo Negri, Marco Turchi, Thanh-Le Ha, Elizabeth Salesky, Ramon Sanabria, Loic Barrault, Lucia Specia, Marcello Federico
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Association for Computational Linguistics
Note:
Pages:
Language:
URL:
https://aclanthology.org/2019.iwslt-1.26
DOI:
Bibkey:
Cite (ACL):
MinhQuang Pham, Josep Crego, François Yvon, and Jean Senellart. 2019. Generic and Specialized Word Embeddings for Multi-Domain Machine Translation. In Proceedings of the 16th International Conference on Spoken Language Translation, Hong Kong. Association for Computational Linguistics.
Cite (Informal):
Generic and Specialized Word Embeddings for Multi-Domain Machine Translation (Pham et al., IWSLT 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2019.iwslt-1.26.pdf