Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages
Sourabh Dattatray Deoghare, Diptesh Kanojia, Pushpak Bhattacharyya
Abstract
This exploratory study investigates the potential of multilingual Automatic Post-Editing (APE) systems to enhance the quality of machine translations for low-resource Indo-Aryan languages. Focusing on two closely related language pairs, English-Marathi and English-Hindi, we exploit the linguistic similarities to develop a robust multilingual APE model. To facilitate cross-linguistic transfer, we generate synthetic Hindi-Marathi and Marathi-Hindi APE triplets. Additionally, we incorporate a Quality Estimation (QE)-APE multi-task learning framework. While the experimental results underline the complementary nature of APE and QE, we also observe that QE-APE multitask learning facilitates effective domain adaptation. Our experiments demonstrate that the multilingual APE models outperform their corresponding English-Hindi and English-Marathi single-pair models by 2.5 and 2.39 TER points, respectively, with further notable improvements over the multilingual APE model observed through multi-task learning (+1.29 and +1.44 TER points), data augmentation (+0.53 and +0.45 TER points) and domain adaptation (+0.35 and +0.45 TER points). We release the synthetic data, code, and models accrued during this study publicly for further research.- Anthology ID:
- 2024.findings-emnlp.634
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2024
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 10800–10812
- Language:
- URL:
- https://aclanthology.org/2024.findings-emnlp.634
- DOI:
- 10.18653/v1/2024.findings-emnlp.634
- Cite (ACL):
- Sourabh Dattatray Deoghare, Diptesh Kanojia, and Pushpak Bhattacharyya. 2024. Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 10800–10812, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages (Deoghare et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.findings-emnlp.634.pdf