Abstract
Even though, machine translation has seen huge improvements in the the last decade, translation quality for Indic languages is still underwhelming, which is attributed to the small amount of parallel data available. In this paper, we present our approach to mitigate the issue of the low amount of parallel training data availability for Indic languages, especially for the language pair English-Manipuri and Assamese-English. Our primary submission for the Manipuri-to-English translation task provided the best scoring system for this language direction. We describe about the systems we built in detail and our findings in the process.- Anthology ID:
- 2023.wmt-1.93
- Volume:
- Proceedings of the Eighth Conference on Machine Translation
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Philipp Koehn, Barry Haddow, Tom Kocmi, Christof Monz
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 972–977
- Language:
- URL:
- https://aclanthology.org/2023.wmt-1.93
- DOI:
- 10.18653/v1/2023.wmt-1.93
- Cite (ACL):
- Dhairya Suman, Atanu Mandal, Santanu Pal, and Sudip Naskar. 2023. IACS-LRILT: Machine Translation for Low-Resource Indic Languages. In Proceedings of the Eighth Conference on Machine Translation, pages 972–977, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- IACS-LRILT: Machine Translation for Low-Resource Indic Languages (Suman et al., WMT 2023)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2023.wmt-1.93.pdf