Samsung R&D Institute Philippines @ WMT 2024 Low-resource Languages of Spain Shared Task
Dan John Velasco, Manuel Antonio Rufino, Jan Christian Blaise Cruz
Abstract
This paper details the submission of Samsung R&D Institute Philippines (SRPH) Language Intelligence Team (LIT) to the WMT 2024 Low-resource Languages of Spain shared task. We trained translation models for Spanish to Aragonese, Spanish to Aranese/Occitan, and Spanish to Asturian using a standard sequence-to-sequence Transformer architecture, augmenting it with a noisy-channel reranking strategy to select better outputs during decoding. For Spanish to Asturian translation, our method reaches comparable BLEU scores to a strong commercial baseline translation system using only constrained data, backtranslations, noisy channel reranking, and a shared vocabulary spanning all four languages.- Anthology ID:
- 2024.wmt-1.86
- Volume:
- Proceedings of the Ninth Conference on Machine Translation
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
- Venue:
- WMT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 892–900
- Language:
- URL:
- https://preview.aclanthology.org/add-emnlp-2024-awards/2024.wmt-1.86/
- DOI:
- 10.18653/v1/2024.wmt-1.86
- Cite (ACL):
- Dan John Velasco, Manuel Antonio Rufino, and Jan Christian Blaise Cruz. 2024. Samsung R&D Institute Philippines @ WMT 2024 Low-resource Languages of Spain Shared Task. In Proceedings of the Ninth Conference on Machine Translation, pages 892–900, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Samsung R&D Institute Philippines @ WMT 2024 Low-resource Languages of Spain Shared Task (Velasco et al., WMT 2024)
- PDF:
- https://preview.aclanthology.org/add-emnlp-2024-awards/2024.wmt-1.86.pdf