COCOGEC: Counterfactual Generation for Robust Grammatical Error Correction

Qianyu Wang, Xiaoman Wang, Yuanyuan Liang, Xinyuan Li, Yunshi Lan


Abstract
Grammatical error correction (GEC) systems are usually trained and evaluated on GEC benchmarks, but their performance often drops sharply once the surrounding context is slightly perturbed or extended. This indicates that the existing GEC models usually fail to understand the error patterns in the varying contexts. In this paper, we thoroughly investigate the counterfactuals for GEC tasks, where the subtle changes to the contexts could lead to the label flipping issue. We address this robustness gap by viewing contextual variation through the lens of counterfactual data. We propose CoCoGEC, a counterfactual generation framework that creates copies of training instances with error-irrelevant contexts altered. Our framework systematically generates counterfactuals by (1) generating intra- and inter-sentence counterfactuals that maintain the error patterns as well as syntax of the original instances by altering the word-level and sentence-level contexts; (2) revising the generated counterfactuals by selecting the instances with flipped labels and high GEC Mutual Information (MI) coefficient. Extensive experiments show that our method substantially improves the stability of GEC models, outperforming a set of data augmentation baselines. Particularly, it could achieve absolute F0.5 gains of +9.9, +11.3, and +20.8 points on the perturbed BEA-19*,CoNLL-14*, and TEM-8* data set.Our code is released at https://github.com/Quinnok/CoCoGEC.
Anthology ID:
2026.findings-acl.195
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4004–4019
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.195/
DOI:
Bibkey:
Cite (ACL):
Qianyu Wang, Xiaoman Wang, Yuanyuan Liang, Xinyuan Li, and Yunshi Lan. 2026. COCOGEC: Counterfactual Generation for Robust Grammatical Error Correction. In Findings of the Association for Computational Linguistics: ACL 2026, pages 4004–4019, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
COCOGEC: Counterfactual Generation for Robust Grammatical Error Correction (Wang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.195.pdf
Checklist:
 2026.findings-acl.195.checklist.pdf