A Neural Grammatical Error Correction System Built On Better Pre-training and Sequential Transfer Learning

Yo Joong Choe, Jiyeon Ham, Kyubyong Park, Yeoil Yoon


Abstract
Grammatical error correction can be viewed as a low-resource sequence-to-sequence task, because publicly available parallel corpora are limited. To tackle this challenge, we first generate erroneous versions of large unannotated corpora using a realistic noising function. The resulting parallel corpora are sub-sequently used to pre-train Transformer models. Then, by sequentially applying transfer learning, we adapt these models to the domain and style of the test set. Combined with a context-aware neural spellchecker, our system achieves competitive results in both restricted and low resource tracks in ACL 2019 BEAShared Task. We release all of our code and materials for reproducibility.
Anthology ID:
W19-4423
Volume:
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Helen Yannakoudakis, Ekaterina Kochmar, Claudia Leacock, Nitin Madnani, Ildikó Pilán, Torsten Zesch
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
213–227
Language:
URL:
https://aclanthology.org/W19-4423
DOI:
10.18653/v1/W19-4423
Bibkey:
Cite (ACL):
Yo Joong Choe, Jiyeon Ham, Kyubyong Park, and Yeoil Yoon. 2019. A Neural Grammatical Error Correction System Built On Better Pre-training and Sequential Transfer Learning. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 213–227, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
A Neural Grammatical Error Correction System Built On Better Pre-training and Sequential Transfer Learning (Choe et al., BEA 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/W19-4423.pdf
Code
 kakaobrain/helo_word +  additional community code
Data
WI-LOCNESSWikiText-103WikiText-2