IIT Bombay’s WMT22 Automatic Post-Editing Shared Task Submission

Sourabh Deoghare, Pushpak Bhattacharyya


Abstract
This paper describes IIT Bombay’s submission to the WMT22 Automatic Post-Editing (APE) shared task for the English-Marathi (En-Mr) language pair. We follow the curriculum training strategy to train our APE system. First, we train an encoder-decoder model to perform translation from English to Marathi. Next, we add another encoder to the model and train the resulting dual-encoder single-decoder model for the APE task. This involves training the model using the synthetic APE data in multiple training stages and then fine-tuning it using the real APE data. We use the LaBSE technique to ensure the quality of the synthetic APE data. For data augmentation, along with using candidates obtained from an external machine translation (MT) system, we also use the phrase-level APE triplets generated using phrase table injection. As APE systems are prone to the problem of ‘over-correction’, we use a sentence-level quality estimation (QE) system to select the final output between an original translation and the corresponding output generated by the APE model. Our approach improves the TER and BLEU scores on the development set by -3.92 and +4.36 points, respectively. Also, the final results on the test set show that our APE system outperforms the baseline system by -3.49 TER points and +5.37 BLEU points.
Anthology ID:
2022.wmt-1.67
Volume:
Proceedings of the Seventh Conference on Machine Translation (WMT)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Philipp Koehn, Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Tom Kocmi, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri, Aurélie Névéol, Mariana Neves, Martin Popel, Marco Turchi, Marcos Zampieri
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
682–688
Language:
URL:
https://aclanthology.org/2022.wmt-1.67
DOI:
Bibkey:
Cite (ACL):
Sourabh Deoghare and Pushpak Bhattacharyya. 2022. IIT Bombay’s WMT22 Automatic Post-Editing Shared Task Submission. In Proceedings of the Seventh Conference on Machine Translation (WMT), pages 682–688, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
IIT Bombay’s WMT22 Automatic Post-Editing Shared Task Submission (Deoghare & Bhattacharyya, WMT 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2022.wmt-1.67.pdf