Noising Scheme for Data Augmentation in Automatic Post-Editing
WonKee Lee, Jaehun Shin, Baikjin Jung, Jihyung Lee, Jong-Hyeok Lee
Abstract
This paper describes POSTECH’s submission to WMT20 for the shared task on Automatic Post-Editing (APE). Our focus is on increasing the quantity of available APE data to overcome the shortage of human-crafted training data. In our experiment, we implemented a noising module that simulates four types of post-editing errors, and we introduced this module into a Transformer-based multi-source APE model. Our noising module implants errors into texts on the target side of parallel corpora during the training phase to make synthetic MT outputs, increasing the entire number of training samples. We also generated additional training data using the parallel corpora and NMT model that were released for the Quality Estimation task, and we used these data to train our APE model. Experimental results on the WMT20 English-German APE data set show improvements over the baseline in terms of both the TER and BLEU scores: our primary submission achieved an improvement of -3.15 TER and +4.01 BLEU, and our contrastive submission achieved an improvement of -3.34 TER and +4.30 BLEU.- Anthology ID:
- 2020.wmt-1.83
- Volume:
- Proceedings of the Fifth Conference on Machine Translation
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 783–788
- Language:
- URL:
- https://aclanthology.org/2020.wmt-1.83
- DOI:
- Cite (ACL):
- WonKee Lee, Jaehun Shin, Baikjin Jung, Jihyung Lee, and Jong-Hyeok Lee. 2020. Noising Scheme for Data Augmentation in Automatic Post-Editing. In Proceedings of the Fifth Conference on Machine Translation, pages 783–788, Online. Association for Computational Linguistics.
- Cite (Informal):
- Noising Scheme for Data Augmentation in Automatic Post-Editing (Lee et al., WMT 2020)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2020.wmt-1.83.pdf
- Data
- eSCAPE