A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction

Masato Mita, Shun Kiyono, Masahiro Kaneko, Jun Suzuki, Kentaro Inui


Abstract
Existing approaches for grammatical error correction (GEC) largely rely on supervised learning with manually created GEC datasets. However, there has been little focus on verifying and ensuring the quality of the datasets, and on how lower-quality data might affect GEC performance. We indeed found that there is a non-negligible amount of “noise” where errors were inappropriately edited or left uncorrected. To address this, we designed a self-refinement method where the key idea is to denoise these datasets by leveraging the prediction consistency of existing models, and outperformed strong denoising baseline methods. We further applied task-specific techniques and achieved state-of-the-art performance on the CoNLL-2014, JFLEG, and BEA-2019 benchmarks. We then analyzed the effect of the proposed denoising method, and found that our approach leads to improved coverage of corrections and facilitated fluency edits which are reflected in higher recall and overall performance.
Anthology ID:
2020.findings-emnlp.26
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2020
Month:
November
Year:
2020
Address:
Online
Editors:
Trevor Cohn, Yulan He, Yang Liu
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
267–280
Language:
URL:
https://aclanthology.org/2020.findings-emnlp.26
DOI:
10.18653/v1/2020.findings-emnlp.26
Bibkey:
Cite (ACL):
Masato Mita, Shun Kiyono, Masahiro Kaneko, Jun Suzuki, and Kentaro Inui. 2020. A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 267–280, Online. Association for Computational Linguistics.
Cite (Informal):
A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction (Mita et al., Findings 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2020.findings-emnlp.26.pdf
Data
JFLEG