Abstract
In this paper, we describe KreolMorisienMT, a dataset for benchmarking machine translation quality of Mauritian Creole. Mauritian Creole (Kreol Morisien) is a French-based creole and a lingua franca of the Republic of Mauritius. KreolMorisienMT consists of a parallel corpus between English and Kreol Morisien, French and Kreol Morisien and a monolingual corpus for Kreol Morisien. We first give an overview of Kreol Morisien and then describe the steps taken to create the corpora. Thereafter, we benchmark Kreol Morisien ↔ English and Kreol Morisien ↔ French models leveraging pre-trained models and multilingual transfer learning. Human evaluation reveals our systems’ high translation quality.- Anthology ID:
- 2022.findings-aacl.3
- Volume:
- Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022
- Month:
- November
- Year:
- 2022
- Address:
- Online only
- Editors:
- Yulan He, Heng Ji, Sujian Li, Yang Liu, Chua-Hui Chang
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 22–29
- Language:
- URL:
- https://aclanthology.org/2022.findings-aacl.3
- DOI:
- Cite (ACL):
- Raj Dabre and Aneerav Sukhoo. 2022. KreolMorisienMT: A Dataset for Mauritian Creole Machine Translation. In Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, pages 22–29, Online only. Association for Computational Linguistics.
- Cite (Informal):
- KreolMorisienMT: A Dataset for Mauritian Creole Machine Translation (Dabre & Sukhoo, Findings 2022)
- PDF:
- https://preview.aclanthology.org/naacl-24-ws-corrections/2022.findings-aacl.3.pdf