Lingua Custodia at WMT’19: Attempts to Control Terminology

Franck Burlot


Abstract
This paper describes Lingua Custodia’s submission to the WMT’19 news shared task for German-to-French on the topic of the EU elections. We report experiments on the adaptation of the terminology of a machine translation system to a specific topic, aimed at providing more accurate translations of specific entities like political parties and person names, given that the shared task provided no in-domain training parallel data dealing with the restricted topic. Our primary submission to the shared task uses backtranslation generated with a type of decoding allowing the insertion of constraints in the output in order to guarantee the correct translation of specific terms that are not necessarily observed in the data.
Anthology ID:
W19-5310
Volume:
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Marco Turchi, Karin Verspoor
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
147–154
Language:
URL:
https://aclanthology.org/W19-5310
DOI:
10.18653/v1/W19-5310
Bibkey:
Cite (ACL):
Franck Burlot. 2019. Lingua Custodia at WMT’19: Attempts to Control Terminology. In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 147–154, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Lingua Custodia at WMT’19: Attempts to Control Terminology (Burlot, WMT 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/W19-5310.pdf