Noising Scheme for Data Augmentation in Automatic Post-Editing
WonKee Lee, Jaehun Shin, Baikjin Jung, Jihyung Lee, Jong-Hyeok Lee
Abstract
This paper describes POSTECH’s submission to WMT20 for the shared task on Automatic Post-Editing (APE). Our focus is on increasing the quantity of available APE data to overcome the shortage of human-crafted training data. In our experiment, we implemented a noising module that simulates four types of post-editing errors, and we introduced this module into a Transformer-based multi-source APE model. Our noising module implants errors into texts on the target side of parallel corpora during the training phase to make synthetic MT outputs, increasing the entire number of training samples. We also generated additional training data using the parallel corpora and NMT model that were released for the Quality Estimation task, and we used these data to train our APE model. Experimental results on the WMT20 English-German APE data set show improvements over the baseline in terms of both the TER and BLEU scores: our primary submission achieved an improvement of -3.15 TER and +4.01 BLEU, and our contrastive submission achieved an improvement of -3.34 TER and +4.30 BLEU.- Anthology ID:
- 2020.wmt-1.83
- Volume:
- Proceedings of the Fifth Conference on Machine Translation
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Editors:
- Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Yvette Graham, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 783–788
- Language:
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2020.wmt-1.83/
- DOI:
- Cite (ACL):
- WonKee Lee, Jaehun Shin, Baikjin Jung, Jihyung Lee, and Jong-Hyeok Lee. 2020. Noising Scheme for Data Augmentation in Automatic Post-Editing. In Proceedings of the Fifth Conference on Machine Translation, pages 783–788, Online. Association for Computational Linguistics.
- Cite (Informal):
- Noising Scheme for Data Augmentation in Automatic Post-Editing (Lee et al., WMT 2020)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2020.wmt-1.83.pdf
- Data
- eSCAPE