Abstract
Neural Machine Translation (NMT) for low-resource languages remains a challenge for many NLP researchers. In this work, we deploy a standard data augmentation methodology by back-translation to a new language translation direction, i.e., Cantonese-to-English. We present the models we fine-tuned using the limited amount of real data and the synthetic data we generated using back-translation by three models: OpusMT, NLLB, and mBART.We carried out automatic evaluation using a range of different metrics including those that are lexical-based and embedding-based.Furthermore, we create a user-friendly interface for the models we included in this project, CantonMT, and make it available to facilitate Cantonese-to-English MT research. Researchers can add more models to this platform via our open-source CantonMT toolkit, available at https://github.com/kenrickkung/CantoneseTranslation.- Anthology ID:
- 2024.eamt-1.49
- Volume:
- Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)
- Month:
- June
- Year:
- 2024
- Address:
- Sheffield, UK
- Editors:
- Carolina Scarton, Charlotte Prescott, Chris Bayliss, Chris Oakley, Joanna Wright, Stuart Wrigley, Xingyi Song, Edward Gow-Smith, Rachel Bawden, Víctor M Sánchez-Cartagena, Patrick Cadwell, Ekaterina Lapshinova-Koltunski, Vera Cabarrão, Konstantinos Chatzitheodorou, Mary Nurminen, Diptesh Kanojia, Helena Moniz
- Venue:
- EAMT
- SIG:
- Publisher:
- European Association for Machine Translation (EAMT)
- Note:
- Pages:
- 590–599
- Language:
- URL:
- https://aclanthology.org/2024.eamt-1.49
- DOI:
- Cite (ACL):
- Kung Hong, Lifeng Han, Riza Batista-Navarro, and Goran Nenadic. 2024. CantonMT: Cantonese to English NMT Platform with Fine-Tuned Models using Real and Synthetic Back-Translation Data. In Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), pages 590–599, Sheffield, UK. European Association for Machine Translation (EAMT).
- Cite (Informal):
- CantonMT: Cantonese to English NMT Platform with Fine-Tuned Models using Real and Synthetic Back-Translation Data (Hong et al., EAMT 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.eamt-1.49.pdf