Abstract
This paper introduces a translation of the FLORES+ dataset into the endangered Erzya language, with the goal of evaluating machine translation between this language and any of the other 200 languages already included into FLORES+. This translation was carried out as a part of the Open Language Data shared task at WMT24. We also present a benchmark of existing translation models bases on this dataset and a new translation model that achieves the state-of-the-art quality of translation into Erzya from Russian and English.- Anthology ID:
- 2024.wmt-1.49
- Volume:
- Proceedings of the Ninth Conference on Machine Translation
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
- Venue:
- WMT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 614–623
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2024.wmt-1.49/
- DOI:
- 10.18653/v1/2024.wmt-1.49
- Cite (ACL):
- Isai Gordeev, Sergey Kuldin, and David Dale. 2024. FLORES+ Translation and Machine Translation Evaluation for the Erzya Language. In Proceedings of the Ninth Conference on Machine Translation, pages 614–623, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- FLORES+ Translation and Machine Translation Evaluation for the Erzya Language (Gordeev et al., WMT 2024)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2024.wmt-1.49.pdf