Abstract
In this paper, we describe our system for the WMT 24 shared task of Low-Resource Indic Language Translation. We consider eng↔{as, kha, lus, mni} as participating language pairs. In this shared task, we explore the fine-tuning of a pre-trained model motivated by the pre-trained objective of aligning embeddings closer by alignment augmentation (Lin et al.,2020) for 22 scheduled Indian languages. Our primary system is based on language-specific finetuning on a pre-trained model. We achieve chrF2 scores of 50.6, 42.3, 54.9, and 66.3 on the official public test set for eng→as, eng→kha, eng→lus, eng→mni respectively. We also explore multilingual training with/without language grouping and layer-freezing.- Anthology ID:
- 2024.wmt-1.70
- Volume:
- Proceedings of the Ninth Conference on Machine Translation
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
- Venue:
- WMT
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 781–787
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2024.wmt-1.70/
- DOI:
- 10.18653/v1/2024.wmt-1.70
- Cite (ACL):
- Pramit Sahoo, Maharaj Brahma, and Maunendra Sankar Desarkar. 2024. NLIP_Lab-IITH Low-Resource MT System for WMT24 Indic MT Shared Task. In Proceedings of the Ninth Conference on Machine Translation, pages 781–787, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- NLIP_Lab-IITH Low-Resource MT System for WMT24 Indic MT Shared Task (Sahoo et al., WMT 2024)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2024.wmt-1.70.pdf