Rucha Ambaliya
2025
Niyamika at BHASHA Task 1: Word-Level Transliteration for English-Hindi Mixed Text in Grammar Correction Using MT5
Rucha Ambaliya
|
Mahika Dugar
|
Pruthwik Mishra
Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025)
Grammar correction for Indian languages poses significant challenges due to complex morphology, non-standard spellings, and frequent script variations. In this work, we address grammar correction for English-mixed sentences in five Indic languages—Hindi, Bengali, Malayalam, Tamil, and Telugu—as part of the IndicGEC 2025 Bhasha Workshop. Our approach first applies word-level transliteration using IndicTrans (Bhat et al., 2014) to normalize Romanized and mixed-script tokens, followed by grammar correction using the mT5-small model (Xue et al., 2021). Although our experiments focus on these five languages, the methodology is generalizable to other Indian languages. Our implementation and code are publicly available at: https://github.com/Rucha-Ambaliya/bhasha-workshop