LM-Critic: Language Models for Unsupervised Grammatical Error Correction

Michihiro Yasunaga; Jure Leskovec; Percy Liang

doi:10.18653/v1/2021.emnlp-main.611

LM-Critic: Language Models for Unsupervised Grammatical Error Correction

Michihiro Yasunaga, Jure Leskovec, Percy Liang

Abstract

Grammatical error correction (GEC) requires a set of labeled ungrammatical / grammatical sentence pairs for training, but obtaining such annotation can be prohibitively expensive. Recently, the Break-It-Fix-It (BIFI) framework has demonstrated strong results on learning to repair a broken program without any labeled examples, but this relies on a perfect critic (e.g., a compiler) that returns whether an example is valid or not, which does not exist for the GEC task. In this work, we show how to leverage a pretrained language model (LM) in defining an LM-Critic, which judges a sentence to be grammatical if the LM assigns it a higher probability than its local perturbations. We apply this LM-Critic and BIFI along with a large set of unlabeled sentences to bootstrap realistic ungrammatical / grammatical pairs for training a corrector. We evaluate our approach on GEC datasets on multiple domains (CoNLL-2014, BEA-2019, GMEG-wiki and GMEG-yahoo) and show that it outperforms existing methods in both the unsupervised setting (+7.7 F0.5) and the supervised setting (+0.5 F0.5).

Anthology ID:: 2021.emnlp-main.611
Volume:: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2021
Address:: Online and Punta Cana, Dominican Republic
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7752–7763
Language:
URL:: https://aclanthology.org/2021.emnlp-main.611
DOI:: 10.18653/v1/2021.emnlp-main.611
Bibkey:
Cite (ACL):: Michihiro Yasunaga, Jure Leskovec, and Percy Liang. 2021. LM-Critic: Language Models for Unsupervised Grammatical Error Correction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7752–7763, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: LM-Critic: Language Models for Unsupervised Grammatical Error Correction (Yasunaga et al., EMNLP 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/update-css-js/2021.emnlp-main.611.pdf
Code: michiyasunaga/LM-Critic + additional community code
Data: CoNLL-2014 Shared Task: Grammatical Error Correction, GMEG-wiki, GMEG-yahoo, WI-LOCNESS

PDF Cite Search Code