Improving Grammatical Error Correction with Data Augmentation by Editing Latent Representation

Zhaohong Wan, Xiaojun Wan, Wenguang Wang


Abstract
The incorporation of data augmentation method in grammatical error correction task has attracted much attention. However, existing data augmentation methods mainly apply noise to tokens, which leads to the lack of diversity of generated errors. In view of this, we propose a new data augmentation method that can apply noise to the latent representation of a sentence. By editing the latent representations of grammatical sentences, we can generate synthetic samples with various error types. Combining with some pre-defined rules, our method can greatly improve the performance and robustness of existing grammatical error correction models. We evaluate our method on public benchmarks of GEC task and it achieves the state-of-the-art performance on CoNLL-2014 and FCE benchmarks.
Anthology ID:
2020.coling-main.200
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
2202–2212
Language:
URL:
https://aclanthology.org/2020.coling-main.200
DOI:
10.18653/v1/2020.coling-main.200
Bibkey:
Cite (ACL):
Zhaohong Wan, Xiaojun Wan, and Wenguang Wang. 2020. Improving Grammatical Error Correction with Data Augmentation by Editing Latent Representation. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2202–2212, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Improving Grammatical Error Correction with Data Augmentation by Editing Latent Representation (Wan et al., COLING 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2020.coling-main.200.pdf
Data
FCE