Low-Resource Grammatical Error Correction: Selective Data Augmentation with Round-Trip Machine Translation

Frank Palma Gomez, Alla Rozovskaya


Abstract
Supervised state-of-the-art methods for grammatical error correction require large amounts of parallel data for training. Due to lack of gold-labeled data, techniques that create synthetic training data have become popular. We show that models trained on synthetic data tend tocorrect a limited range of grammar and spelling mistakes that involve character-level changes, but perform poorly on (more complex) phenomena that require word-level changes. We propose to address the performance gap on such errors by generating synthetic data through selective data augmentation via round-trip machine translation. We show that the proposed technique, SeLex-RT, is capable of generating mistakes that are similar to those observed with language learners. Using the approach with two types of state-of-the-art learning frameworks and two low-resource languages (Russian and Ukrainian), we achieve substantial improvements, compared to training on synthetic data produced with standard techniques. Analysis of the output reveals that models trained on data noisified with the SeLex-RT approach are capable of making word-level changes and correct lexical errors common with language learners.
Anthology ID:
2025.findings-acl.1322
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venues:
Findings | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
25749–25770
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.findings-acl.1322/
DOI:
Bibkey:
Cite (ACL):
Frank Palma Gomez and Alla Rozovskaya. 2025. Low-Resource Grammatical Error Correction: Selective Data Augmentation with Round-Trip Machine Translation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 25749–25770, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Low-Resource Grammatical Error Correction: Selective Data Augmentation with Round-Trip Machine Translation (Palma Gomez & Rozovskaya, Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.findings-acl.1322.pdf