GECToR – Grammatical Error Correction: Tag, Not Rewrite
Kostiantyn Omelianchuk, Vitaliy Atrasevych, Artem Chernodub, Oleksandr Skurzhanskyi
Abstract
In this paper, we present a simple and efficient GEC sequence tagger using a Transformer encoder. Our system is pre-trained on synthetic data and then fine-tuned in two stages: first on errorful corpora, and second on a combination of errorful and error-free parallel corpora. We design custom token-level transformations to map input tokens to target corrections. Our best single-model/ensemble GEC tagger achieves an F_0.5 of 65.3/66.5 on CONLL-2014 (test) and F_0.5 of 72.4/73.6 on BEA-2019 (test). Its inference speed is up to 10 times as fast as a Transformer-based seq2seq GEC system.- Anthology ID:
- 2020.bea-1.16
- Volume:
- Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications
- Month:
- July
- Year:
- 2020
- Address:
- Seattle, WA, USA → Online
- Editors:
- Jill Burstein, Ekaterina Kochmar, Claudia Leacock, Nitin Madnani, Ildikó Pilán, Helen Yannakoudakis, Torsten Zesch
- Venue:
- BEA
- SIG:
- SIGEDU
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 163–170
- Language:
- URL:
- https://aclanthology.org/2020.bea-1.16
- DOI:
- 10.18653/v1/2020.bea-1.16
- Cite (ACL):
- Kostiantyn Omelianchuk, Vitaliy Atrasevych, Artem Chernodub, and Oleksandr Skurzhanskyi. 2020. GECToR – Grammatical Error Correction: Tag, Not Rewrite. In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 163–170, Seattle, WA, USA → Online. Association for Computational Linguistics.
- Cite (Informal):
- GECToR – Grammatical Error Correction: Tag, Not Rewrite (Omelianchuk et al., BEA 2020)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2020.bea-1.16.pdf
- Code
- grammarly/gector + additional community code
- Data
- CoNLL, CoNLL-2014 Shared Task: Grammatical Error Correction, FCE, WI-LOCNESS