Samsung R&D Institute Philippines at WMT 2023

Jan Christian Blaise Cruz


Abstract
In this paper, we describe the constrained submission systems of Samsung R&D Institute Philippines to the WMT 2023 General Translation Task for two directions: en->he and he->en. Our systems comprise of Transformer-based sequence-to-sequence models that are trained with a mix of best practices: comprehensive data preprocessing pipelines, synthetic backtranslated data, and the use of noisy channel reranking during online decoding. Our models perform comparably to, and sometimes outperform, strong baseline unconstrained systems such as mBART50 M2M and NLLB 200 MoE despite having significantly fewer parameters on two public benchmarks: FLORES-200 and NTREX-128.
Anthology ID:
2023.wmt-1.6
Volume:
Proceedings of the Eighth Conference on Machine Translation
Month:
December
Year:
2023
Address:
Singapore
Editors:
Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
103–109
Language:
URL:
https://aclanthology.org/2023.wmt-1.6
DOI:
10.18653/v1/2023.wmt-1.6
Bibkey:
Cite (ACL):
Jan Christian Blaise Cruz. 2023. Samsung R&D Institute Philippines at WMT 2023. In Proceedings of the Eighth Conference on Machine Translation, pages 103–109, Singapore. Association for Computational Linguistics.
Cite (Informal):
Samsung R&D Institute Philippines at WMT 2023 (Cruz, WMT 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2023.wmt-1.6.pdf