Abstract
Grammatical error correction, like other machine learning tasks, greatly benefits from large quantities of high quality training data, which is typically expensive to produce. While writing a program to automatically generate realistic grammatical errors would be difficult, one could learn the distribution of naturally-occurring errors and attempt to introduce them into other datasets. Initial work on inducing errors in this way using statistical machine translation has shown promise; we investigate cheaply constructing synthetic samples, given a small corpus of human-annotated data, using an off-the-rack attentive sequence-to-sequence model and a straight-forward post-processing procedure. Our approach yields error-filled artificial data that helps a vanilla bi-directional LSTM to outperform the previous state of the art at grammatical error detection, and a previously introduced model to gain further improvements of over 5% F0.5 score. When attempting to determine if a given sentence is synthetic, a human annotator at best achieves 39.39 F1 score, indicating that our model generates mostly human-like instances.- Anthology ID:
- D18-1541
- Volume:
- Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
- Month:
- October-November
- Year:
- 2018
- Address:
- Brussels, Belgium
- Editors:
- Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4977–4983
- Language:
- URL:
- https://aclanthology.org/D18-1541
- DOI:
- 10.18653/v1/D18-1541
- Cite (ACL):
- Sudhanshu Kasewa, Pontus Stenetorp, and Sebastian Riedel. 2018. Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4977–4983, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection (Kasewa et al., EMNLP 2018)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/D18-1541.pdf
- Code
- skasewa/wronging
- Data
- FCE