@inproceedings{sharma-bhattacharyya-2025-indigec,
    title = "{I}ndi{GEC}: Multilingual Grammar Error Correction for Low-Resource {I}ndian Languages",
    author = "Sharma, Ujjwal  and
      Bhattacharyya, Pushpak",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1139/",
    pages = "22393--22407",
    ISBN = "979-8-89176-332-6",
    abstract = "Grammatical Error Correction (GEC) for low-resource Indic languages faces significant challenges due to the scarcity of annotated data. In this work, we introduce the Mask-Translate{\&}Fill (MTF) framework, a novel approach for generating high-quality synthetic data for GEC using only monolingual corpora. MTF leverages a machine translation system and a pretrained masked language model to introduce synthetic errors and tries to mimic errors made by second-language learners. Our experimental results on English, Hindi, Bengali, Marathi, and Tamil demonstrate that MTF consistently outperforms other monolingual synthetic data generation methods and achieves performance comparable to the Translation Language Modeling (TLM)-based approach, which uses a bilingual corpus, in both independent and multilingual settings. Under multilingual training, MTF yields significant improvements across Indic languages, with particularly notable gains in Bengali and Tamil, achieving +1.6 and +3.14 GLEU over the TLM-based method, respectively. To support further research, we also introduce the IndiGEC Corpus, a high-quality, human-written, manually validated GEC dataset for these four Indic languages, comprising over 8,000 sentence pairs with separate development and test splits."
}Markdown (Informal)
[IndiGEC: Multilingual Grammar Error Correction for Low-Resource Indian Languages](https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1139/) (Sharma & Bhattacharyya, EMNLP 2025)
ACL