Abstract
In this paper, we describe the constrained submission systems of Samsung R&D Institute Philippines to the WMT 2023 General Translation Task for two directions: en->he and he->en. Our systems comprise of Transformer-based sequence-to-sequence models that are trained with a mix of best practices: comprehensive data preprocessing pipelines, synthetic backtranslated data, and the use of noisy channel reranking during online decoding. Our models perform comparably to, and sometimes outperform, strong baseline unconstrained systems such as mBART50 M2M and NLLB 200 MoE despite having significantly fewer parameters on two public benchmarks: FLORES-200 and NTREX-128.- Anthology ID:
- 2023.wmt-1.6
- Volume:
- Proceedings of the Eighth Conference on Machine Translation
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 103–109
- Language:
- URL:
- https://aclanthology.org/2023.wmt-1.6
- DOI:
- 10.18653/v1/2023.wmt-1.6
- Cite (ACL):
- Jan Christian Blaise Cruz. 2023. Samsung R&D Institute Philippines at WMT 2023. In Proceedings of the Eighth Conference on Machine Translation, pages 103–109, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Samsung R&D Institute Philippines at WMT 2023 (Cruz, WMT 2023)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2023.wmt-1.6.pdf