Abstract
We describe Vicomtech’s participation in the WMT 2024 Shared Task on translation into low-resource languages of Spain. We addressed all three languages of the task, namely Aragonese, Aranese and Asturian, in both constrained and open settings. Our work mainly centred on exploiting different types of corpora via data filtering, selection and combination methods, along with synthetic data generated with translation models based on rules, neural sequence-to-sequence or large language models. We improved or matched the best baselines in all three language pairs and present complementary results on additional test sets.- Anthology ID:
- 2024.wmt-1.91
- Volume:
- Proceedings of the Ninth Conference on Machine Translation
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
- Venue:
- WMT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 934–942
- Language:
- URL:
- https://aclanthology.org/2024.wmt-1.91
- DOI:
- 10.18653/v1/2024.wmt-1.91
- Cite (ACL):
- David Ponce, Harritxu Gete, and Thierry Etchegoyhen. 2024. Vicomtech@WMT 2024: Shared Task on Translation into Low-Resource Languages of Spain. In Proceedings of the Ninth Conference on Machine Translation, pages 934–942, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Vicomtech@WMT 2024: Shared Task on Translation into Low-Resource Languages of Spain (Ponce et al., WMT 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.wmt-1.91.pdf