Machine Translation Advancements for Low-Resource Indian Languages in WMT23: CFILT-IITB’s Effort for Bridging the Gap
Pranav Gaikwad, Meet Doshi, Sourabh Deoghare, Pushpak Bhattacharyya
Abstract
This paper is related to the submission of the CFILT-IITB team for the task called IndicMT in WMT23. The paper describes our MT systems submitted to the WMT23 IndicMT shared task. The task focused on MT system development from/to English and four low-resource North-East Indian languages, viz., Assamese, Khasi, Manipuri, and Mizo. We trained them on a small parallel corpus resulting in poor-quality systems. Therefore, we utilize transfer learning with the help of a large pre-trained multilingual NMT system. Since this approach produced the best results, we submitted our NMT models for the shared task using this approach.- Anthology ID:
- 2023.wmt-1.89
- Volume:
- Proceedings of the Eighth Conference on Machine Translation
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 950–953
- Language:
- URL:
- https://aclanthology.org/2023.wmt-1.89
- DOI:
- 10.18653/v1/2023.wmt-1.89
- Cite (ACL):
- Pranav Gaikwad, Meet Doshi, Sourabh Deoghare, and Pushpak Bhattacharyya. 2023. Machine Translation Advancements for Low-Resource Indian Languages in WMT23: CFILT-IITB’s Effort for Bridging the Gap. In Proceedings of the Eighth Conference on Machine Translation, pages 950–953, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Machine Translation Advancements for Low-Resource Indian Languages in WMT23: CFILT-IITB’s Effort for Bridging the Gap (Gaikwad et al., WMT 2023)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2023.wmt-1.89.pdf