OdiaGenAI participation at WAT 2025

Debasish Dhal, Sambit Sekhar, Revathy V R, Shantipriya Parida, Akash Kumar Dhaka


Abstract
We at ODIAGEN, provide a detailed description of the model, training procedure, results and conclusion of our submission to the Workshop on Asian Translation (WAT 2025). For this year, we focus only on text to text translation tasks on low resource Indic languages targetting Hindi, Bengali, Malayalam and Odia languages specifically. The system uses a large language model NLLB-200 finetuned on large datasets consisting of over 100K rows for each targetted language. The whole training dataset is made of the data provided by the organisers as in previous years and augmented by a much larger 100K sentences of data subsampled from the Samanantar dataset provided by AI4Bharat. From a total of eight evaluation/challenge tests, our approach obtained the highest BLEU scores yet, since the conception on five.
Anthology ID:
2025.wat-1.11
Volume:
Proceedings of the Twelfth Workshop on Asian Translation (WAT 2025)
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Toshiaki Nakazawa, Isao Goto
Venues:
WAT | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
109–114
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.wat-1.11/
DOI:
Bibkey:
Cite (ACL):
Debasish Dhal, Sambit Sekhar, Revathy V R, Shantipriya Parida, and Akash Kumar Dhaka. 2025. OdiaGenAI participation at WAT 2025. In Proceedings of the Twelfth Workshop on Asian Translation (WAT 2025), pages 109–114, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):
OdiaGenAI participation at WAT 2025 (Dhal et al., WAT 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.wat-1.11.pdf