Using Wikipedia Edits in Low Resource Grammatical Error Correction

Adriane Boyd


Abstract
We develop a grammatical error correction (GEC) system for German using a small gold GEC corpus augmented with edits extracted from Wikipedia revision history. We extend the automatic error annotation tool ERRANT (Bryant et al., 2017) for German and use it to analyze both gold GEC corrections and Wikipedia edits (Grundkiewicz and Junczys-Dowmunt, 2014) in order to select as additional training data Wikipedia edits containing grammatical corrections similar to those in the gold corpus. Using a multilayer convolutional encoder-decoder neural network GEC approach (Chollampatt and Ng, 2018), we evaluate the contribution of Wikipedia edits and find that carefully selected Wikipedia edits increase performance by over 5%.
Anthology ID:
W18-6111
Volume:
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text
Month:
November
Year:
2018
Address:
Brussels, Belgium
Editors:
Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
Venue:
WNUT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
79–84
Language:
URL:
https://aclanthology.org/W18-6111
DOI:
10.18653/v1/W18-6111
Bibkey:
Cite (ACL):
Adriane Boyd. 2018. Using Wikipedia Edits in Low Resource Grammatical Error Correction. In Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text, pages 79–84, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
Using Wikipedia Edits in Low Resource Grammatical Error Correction (Boyd, WNUT 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/improve-issue-templates/W18-6111.pdf
Code
 adrianeboyd/boyd-wnut2018