HW-TSC’s Participation in the WMT 2024 QEAPE Task

Jiawei Yu, Xiaofeng Zhao, Min Zhang, Zhao Yanqing, Yuang Li, Su Chang, Xiaosong Qiao, Ma Miaomiao, Hao Yang


Abstract
The paper presents the submission by HW-TSC in the WMT 2024 Quality-informed Automatic Post Editing (QEAPE) shared task for the English-Hindi (En-Hi) and English-Tamil (En-Ta) language pair. We use LLM for En-Hi and Transformer for EN-ta respectively. For LLM, we first continue pertrain the Llama3, and then use the real APE data to SFT the pre-trained LLM. As for the transformer in En-Ta, we first pre-train a Machine Translation (MT) model by utilizing MT data collected from the web. Then, we fine-tune the model by employing real APE data.We also use the data augmentation method to enhance our model. Specifically, we incorporate candidate translations obtained from an external Machine Translation (MT) system.Given that APE systems tend to exhibit a tendency of ‘over-correction’, we employ a sentence-level Quality Estimation (QE) system to select the final output, deciding between the original translation and the corresponding output generated by the APE model. Our experiments demonstrate that pre-trained MT models are effective when being fine-tuned with the APE corpus of a limited size, and the performance can be further improved with external MT augmentation. our approach improves the HTER by -15.99 points and -0.47 points on En-Hi and En-Ta, respectively.
Anthology ID:
2024.wmt-1.40
Volume:
Proceedings of the Ninth Conference on Machine Translation
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz
Venue:
WMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
541–546
Language:
URL:
https://aclanthology.org/2024.wmt-1.40
DOI:
10.18653/v1/2024.wmt-1.40
Bibkey:
Cite (ACL):
Jiawei Yu, Xiaofeng Zhao, Min Zhang, Zhao Yanqing, Yuang Li, Su Chang, Xiaosong Qiao, Ma Miaomiao, and Hao Yang. 2024. HW-TSC’s Participation in the WMT 2024 QEAPE Task. In Proceedings of the Ninth Conference on Machine Translation, pages 541–546, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
HW-TSC’s Participation in the WMT 2024 QEAPE Task (Yu et al., WMT 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/2024.wmt-1.40.pdf