Team Horizon at BHASHA Task 1: Multilingual IndicGEC with Transformer-based Grammatical Error Correction Models

Manav Dhamecha; Sunil Jaat; Gaurav Damor; Pruthwik Mishra

Team Horizon at BHASHA Task 1: Multilingual IndicGEC with Transformer-based Grammatical Error Correction Models

Manav Dhamecha, Sunil Jaat, Gaurav Damor, Pruthwik Mishra

Abstract

This paper presents Team Horizon’s approach to the BHASHA Shared Task 1: Indic Grammatical Error Correction (IndicGEC). We explore transformer-based multilingual models — mT5-small and IndicBART — to correct grammatical and semantic errors across five Indian languages: Bangla, Hindi, Tamil, Telugu, and Malayalam. Due to limited annotated data, we developed a synthetic data augmentation pipeline that introduces realistic linguistic errors under ten categories, simulating natural mistakes found in Indic scripts. Our fine-tuned models achieved competitive performance with GLEU scores of 86.03 (Tamil), 72.00 (Telugu), 82.69 (Bangla), 80.44 (Hindi), and 84.36 (Malayalam). We analyze the impact of dataset scaling, multilingual fine-tuning, and training epochs, showing that linguistically grounded augmentation can significantly improve grammatical correction accuracy in low-resource Indic languages.

Anthology ID:: 2025.bhasha-1.14
Volume:: Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025)
Month:: December
Year:: 2025
Address:: Mumbai, India
Editors:: Arnab Bhattacharya, Pawan Goyal, Saptarshi Ghosh, Kripabandhu Ghosh
Venues:: BHASHA | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 142–146
Language:
URL:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.bhasha-1.14/
DOI:
Bibkey:
Cite (ACL):: Manav Dhamecha, Sunil Jaat, Gaurav Damor, and Pruthwik Mishra. 2025. Team Horizon at BHASHA Task 1: Multilingual IndicGEC with Transformer-based Grammatical Error Correction Models. In Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025), pages 142–146, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):: Team Horizon at BHASHA Task 1: Multilingual IndicGEC with Transformer-based Grammatical Error Correction Models (Dhamecha et al., BHASHA 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.bhasha-1.14.pdf

PDF Cite Search Fix data