JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction

Courtney Napoles, Keisuke Sakaguchi, Joel Tetreault


Abstract
We present a new parallel corpus, JHU FLuency-Extended GUG corpus (JFLEG) for developing and evaluating grammatical error correction (GEC). Unlike other corpora, it represents a broad range of language proficiency levels and uses holistic fluency edits to not only correct grammatical errors but also make the original text more native sounding. We describe the types of corrections made and benchmark four leading GEC systems on this corpus, identifying specific areas in which they do well and how they can improve. JFLEG fulfills the need for a new gold standard to properly assess the current state of GEC.
Anthology ID:
E17-2037
Volume:
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Mirella Lapata, Phil Blunsom, Alexander Koller
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
229–234
Language:
URL:
https://aclanthology.org/E17-2037
DOI:
Bibkey:
Cite (ACL):
Courtney Napoles, Keisuke Sakaguchi, and Joel Tetreault. 2017. JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 229–234, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction (Napoles et al., EACL 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/E17-2037.pdf
Code
 keisks/jfleg
Data
JFLEGCoNLL-2014 Shared Task: Grammatical Error CorrectionFCEGUG