Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction

Maksym Tarnavskyi; Artem Chernodub; Kostiantyn Omelianchuk

doi:10.18653/v1/2022.acl-long.266

Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction

Maksym Tarnavskyi, Artem Chernodub, Kostiantyn Omelianchuk

Abstract

In this paper, we investigate improvements to the GEC sequence tagging architecture with a focus on ensembling of recent cutting-edge Transformer-based encoders in Large configurations. We encourage ensembling models by majority votes on span-level edits because this approach is tolerant to the model architecture and vocabulary size. Our best ensemble achieves a new SOTA result with an F_0.5 score of 76.05 on BEA-2019 (test), even without pre-training on synthetic datasets. In addition, we perform knowledge distillation with a trained ensemble to generate new synthetic training datasets, “Troy-Blogs” and “Troy-1BW”. Our best single sequence tagging model that is pretrained on the generated Troy- datasets in combination with the publicly available synthetic PIE dataset achieves a near-SOTA result with an F_0.5 score of 73.21 on BEA-2019 (test). The code, datasets, and trained models are publicly available.

Anthology ID:: 2022.acl-long.266
Volume:: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3842–3852
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.acl-long.266/
DOI:: 10.18653/v1/2022.acl-long.266
Bibkey:
Cite (ACL):: Maksym Tarnavskyi, Artem Chernodub, and Kostiantyn Omelianchuk. 2022. Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3842–3852, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction (Tarnavskyi et al., ACL 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.acl-long.266.pdf
Software:: 2022.acl-long.266.software.zip
Video:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.acl-long.266.mp4
Code: makstarnavskyi/gector-large
Data: FCE, WI-LOCNESS

PDF Cite Search Code Software Video Fix data