Abstract
This paper presents Lingua Custodia’s submission to the WMT23 shared task on Terminology shared task. Ensuring precise translation of technical terms plays a pivotal role in gauging the final quality of machine translation results. Our goal is to follow the terminology constraint while applying the machine translation system. Inspired by the recent work of terminology control, we propose to annotate the machine learning training data by leveraging a synthetic dictionary extracted in a fully non supervised way from the give parallel corpora. The model learned with this training data can then be then used to translate text with a given terminology in a flexible manner. In addition, we introduce a careful annotated data re-sampling step in order to guide the model to see different terminology types enough times. In this task we consider all the three language directions: Chinese to English, English to Czech and German to English. Our automatic evaluation metrics with the submitted systems show the effectiveness of the proposed method.- Anthology ID:
- 2023.wmt-1.81
- Volume:
- Proceedings of the Eighth Conference on Machine Translation
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 897–901
- Language:
- URL:
- https://aclanthology.org/2023.wmt-1.81
- DOI:
- 10.18653/v1/2023.wmt-1.81
- Cite (ACL):
- Jingshu Liu, Mariam Nakhlé, Gaëtan Caillout, and Raheel Qadar. 2023. Lingua Custodia’s Participation at the WMT 2023 Terminology Shared Task. In Proceedings of the Eighth Conference on Machine Translation, pages 897–901, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Lingua Custodia’s Participation at the WMT 2023 Terminology Shared Task (Liu et al., WMT 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2023.wmt-1.81.pdf