IACS-LRILT: Machine Translation for Low-Resource Indic Languages

Dhairya Suman, Atanu Mandal, Santanu Pal, Sudip Naskar


Abstract
Even though, machine translation has seen huge improvements in the the last decade, translation quality for Indic languages is still underwhelming, which is attributed to the small amount of parallel data available. In this paper, we present our approach to mitigate the issue of the low amount of parallel training data availability for Indic languages, especially for the language pair English-Manipuri and Assamese-English. Our primary submission for the Manipuri-to-English translation task provided the best scoring system for this language direction. We describe about the systems we built in detail and our findings in the process.
Anthology ID:
2023.wmt-1.93
Volume:
Proceedings of the Eighth Conference on Machine Translation
Month:
December
Year:
2023
Address:
Singapore
Editors:
Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
972–977
Language:
URL:
https://aclanthology.org/2023.wmt-1.93
DOI:
10.18653/v1/2023.wmt-1.93
Bibkey:
Cite (ACL):
Dhairya Suman, Atanu Mandal, Santanu Pal, and Sudip Naskar. 2023. IACS-LRILT: Machine Translation for Low-Resource Indic Languages. In Proceedings of the Eighth Conference on Machine Translation, pages 972–977, Singapore. Association for Computational Linguistics.
Cite (Informal):
IACS-LRILT: Machine Translation for Low-Resource Indic Languages (Suman et al., WMT 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2023.wmt-1.93.pdf