Domain Adaptation of Document-Level NMT in IWSLT19

Martin Popel, Christian Federmann


Abstract
We describe our four NMT systems submitted to the IWSLT19 shared task in English→Czech text-to-text translation of TED talks. The goal of this study is to understand the interactions between document-level NMT and domain adaptation. All our systems are based on the Transformer model implemented in the Tensor2Tensor framework. Two of the systems serve as baselines, which are not adapted to the TED talks domain: SENTBASE is trained on single sen- tences, DOCBASE on multi-sentence (document-level) sequences. The other two submitted systems are adapted to TED talks: SENTFINE is fine-tuned on single sentences, DOCFINE is fine-tuned on multi-sentence sequences. We present both automatic-metrics evaluation and manual analysis of the translation quality, focusing on the differences between the four systems.
Anthology ID:
2019.iwslt-1.8
Volume:
Proceedings of the 16th International Conference on Spoken Language Translation
Month:
November 2-3
Year:
2019
Address:
Hong Kong
Editors:
Jan Niehues, Rolando Cattoni, Sebastian Stüker, Matteo Negri, Marco Turchi, Thanh-Le Ha, Elizabeth Salesky, Ramon Sanabria, Loic Barrault, Lucia Specia, Marcello Federico
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Association for Computational Linguistics
Note:
Pages:
Language:
URL:
https://aclanthology.org/2019.iwslt-1.8
DOI:
Bibkey:
Cite (ACL):
Martin Popel and Christian Federmann. 2019. Domain Adaptation of Document-Level NMT in IWSLT19. In Proceedings of the 16th International Conference on Spoken Language Translation, Hong Kong. Association for Computational Linguistics.
Cite (Informal):
Domain Adaptation of Document-Level NMT in IWSLT19 (Popel & Federmann, IWSLT 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/add_acl24_videos/2019.iwslt-1.8.pdf
Data
MuST-CWMT 2018