Revitalization of Indigenous Languages through Pre-processing and Neural Machine Translation: The case of Inuktitut

Tan Ngoc Le, Fatiha Sadat


Abstract
Indigenous languages have been very challenging when dealing with NLP tasks and applications because of multiple reasons. These languages, in linguistic typology, are polysynthetic and highly inflected with rich morphophonemics and variable dialectal-dependent spellings; which affected studies on any NLP task in the recent years. Moreover, Indigenous languages have been considered as low-resource and/or endangered; which poses a great challenge for research related to Artificial Intelligence and its fields, such as NLP and machine learning. In this paper, we propose a study on the Inuktitut language through pre-processing and neural machine translation, in order to revitalize the language which belongs to the Inuit family, a type of polysynthetic languages spoken in Northern Canada. Our focus is concentrated on: (1) the preprocessing phase, and (2) applications on specific NLP tasks such as morphological analysis and neural machine translation, both for Indigenous languages of Canada. Our evaluations in the context of lowresource Inuktitut-English Neural Machine Translation, showed significant improvements of the proposed approach compared to the state-of-the-art.
Anthology ID:
2020.coling-main.410
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
4661–4666
Language:
URL:
https://aclanthology.org/2020.coling-main.410
DOI:
10.18653/v1/2020.coling-main.410
Bibkey:
Cite (ACL):
Tan Ngoc Le and Fatiha Sadat. 2020. Revitalization of Indigenous Languages through Pre-processing and Neural Machine Translation: The case of Inuktitut. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4661–4666, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Revitalization of Indigenous Languages through Pre-processing and Neural Machine Translation: The case of Inuktitut (Ngoc Le & Sadat, COLING 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.coling-main.410.pdf