KreolMorisienMT: A Dataset for Mauritian Creole Machine Translation

Raj Dabre, Aneerav Sukhoo


Abstract
In this paper, we describe KreolMorisienMT, a dataset for benchmarking machine translation quality of Mauritian Creole. Mauritian Creole (Kreol Morisien) is a French-based creole and a lingua franca of the Republic of Mauritius. KreolMorisienMT consists of a parallel corpus between English and Kreol Morisien, French and Kreol Morisien and a monolingual corpus for Kreol Morisien. We first give an overview of Kreol Morisien and then describe the steps taken to create the corpora. Thereafter, we benchmark Kreol Morisien ↔ English and Kreol Morisien ↔ French models leveraging pre-trained models and multilingual transfer learning. Human evaluation reveals our systems’ high translation quality.
Anthology ID:
2022.findings-aacl.3
Volume:
Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022
Month:
November
Year:
2022
Address:
Online only
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22–29
Language:
URL:
https://aclanthology.org/2022.findings-aacl.3
DOI:
Bibkey:
Cite (ACL):
Raj Dabre and Aneerav Sukhoo. 2022. KreolMorisienMT: A Dataset for Mauritian Creole Machine Translation. In Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, pages 22–29, Online only. Association for Computational Linguistics.
Cite (Informal):
KreolMorisienMT: A Dataset for Mauritian Creole Machine Translation (Dabre & Sukhoo, Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.findings-aacl.3.pdf
Dataset:
 2022.findings-aacl.3.Dataset.zip