Evaluating IndicTrans2 and ByT5 for English–Santali Machine Translation Using the Ol Chiki Script

Kshetrimayum Boynao Singh, Asif Ekbal, Partha Pakray


Abstract
In this study, we examine and evaluate two multilingual NMT models, IndicTrans2 and ByT5, for English-Santali bidirectional translation using the Ol Chiki script. The models are trained on the MMLoSo Shared Task dataset, supplemented with public English-Santali resources, and evaluated on the AI4Bharat IN22 and Flores test sets, specifically IN22-Gen and Flores200-dev. IndicTrans2 finetune strongly outperforms ByT5 across both directions. On IN22-Gen, it achieves 26.8 BLEU and 53.9 chrF++ for Santali→English and 7.3 BLEU and 40.3 chrF++ for English→Santali, compared to ByT5’s 5.6 BLEU and 30.2 chrF++ for Santali→English and 2.9 BLEU and 32.6 chrF++ for English→Santali. On the Flores test set, IndicTrans2 finetune achieves 22 BLEU, 49.2 chrF++, and 4.7 BLEU, 32.7 chrF++. Again, it surpasses ByT5. While ByT5’s bytelevel modelling is script-agnostic, it struggles with Santali morphology. IndicTrans2 benefits from multilingual pre-training and script unification.
Anthology ID:
2025.mmloso-1.9
Volume:
Proceedings of the 1st Workshop on Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo 2025)
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Ankita Shukla, Sandeep Kumar, Amrit Singh Bedi, Tanmoy Chakraborty
Venues:
MMLoSo | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
95–100
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.mmloso-1.9/
DOI:
Bibkey:
Cite (ACL):
Kshetrimayum Boynao Singh, Asif Ekbal, and Partha Pakray. 2025. Evaluating IndicTrans2 and ByT5 for English–Santali Machine Translation Using the Ol Chiki Script. In Proceedings of the 1st Workshop on Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo 2025), pages 95–100, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):
Evaluating IndicTrans2 and ByT5 for English–Santali Machine Translation Using the Ol Chiki Script (Singh et al., MMLoSo 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.mmloso-1.9.pdf