Parsing for Mauritian Creole Using Universal Dependencies

Neha Ramsurrun, Rolando Coto-Solano, Michael Gonzalez


Abstract
This paper presents a first attempt to apply Universal Dependencies (De Marneffe et al., 2021) to train a parser for Mauritian Creole (MC), a French-based Creole language spoken on the island of Mauritius. This paper demonstrates the construction of a 161-sentence (1007-token) treebank for MC and evaluates the performance of a part-of-speech tagger and Universal Dependencies parser trained on this data. The sentences were collected from publicly available grammar books (Syea, 2013) and online resources (Baker and Kriegel, 2013), as well as from government-produced school textbooks (Antonio-Françoise et al., 2021; Natchoo et al., 2017). The parser, trained with UDPipe 2 (Straka, 2018), reached F1 scores of UPOS=86.2, UAS=80.8 and LAS=69.8. This fares favorably when compared to models of similar size for other under-resourced Indigenous and Creole languages. We then address some of the challenges faced when applying UD to Creole languages in general and to Mauritian Creole in particular. The main challenge was the handling of spelling variation in the input. Other issues include the tagging of modal verbs, middle voice sentences, and parts of the tense-aspect-mood system (such as the particle fek).
Anthology ID:
2024.lrec-main.1105
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
12622–12632
Language:
URL:
https://aclanthology.org/2024.lrec-main.1105
DOI:
Bibkey:
Cite (ACL):
Neha Ramsurrun, Rolando Coto-Solano, and Michael Gonzalez. 2024. Parsing for Mauritian Creole Using Universal Dependencies. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 12622–12632, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Parsing for Mauritian Creole Using Universal Dependencies (Ramsurrun et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2024.lrec-main.1105.pdf