Fine-tuning GEC Model Based on Language Family Corpus

Yitao Liu; Mark Dras

Fine-tuning GEC Model Based on Language Family Corpus

Abstract

"It is widely known that the first language (L1) of the English learners will influence their language study, causing them make to biased errors. However, it is relatively limited for the research of using the L1 information to improve Grammatical Error Correction (GEC) models. Among the limited research, a common method is to train a set of GEC models, and each model is trained bya corpus from one (and only one) specific L1 background. This method has been proven efficient,while the waste of the training / fine-tuning data makes it suffer from the data limitation issue.This paper introduces a novel method to address this issue by exploiting the linguistic similarities between a language family and its member languages. We expand the fine-tuning data from one specific L1 background to its language family one, making the quantity increase exponentially. We use the Italic language family corpus as our language family corpus and experiment with two approaches facing two situations, mainly differing in development data. The results show that,for the approach that uses the Italic language family corpus to be the fine-tuning data and uses the development data where the L1 background is the same as the one of the test data, the GEC models improve clearly; however, the way that influences the models is not uniform, and varies by error types."

Anthology ID:: 2025.ccl-1.70
Volume:: Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Month:: August
Year:: 2025
Address:: Jinan, China
Editors:: Maosong Sun, Peiyong Duan, Zhiyuan Liu, Ruifeng Xu, Weiwei Sun
Venue:: CCL
SIG:
Publisher:: Chinese Information Processing Society of China
Note:
Pages:: 922–933
Language:
URL:: https://preview.aclanthology.org/ingest-ccl/2025.ccl-1.70/
DOI:
Bibkey:
Cite (ACL):: Yitao Liu and Mark Dras. 2025. Fine-tuning GEC Model Based on Language Family Corpus. In Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025), pages 922–933, Jinan, China. Chinese Information Processing Society of China.
Cite (Informal):: Fine-tuning GEC Model Based on Language Family Corpus (Liu & Dras, CCL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-ccl/2025.ccl-1.70.pdf

PDF Cite Search Fix data