Deep Learning-Based Morphological Segmentation for Indigenous Languages: A Study Case on Innu-Aimun
Ngoc Tan Le, Antoine Cadotte, Mathieu Boivin, Fatiha Sadat, Jimena Terraza
Abstract
Recent advances in the field of deep learning have led to a growing interest in the development of NLP approaches for low-resource and endangered languages. Nevertheless, relatively little research, related to NLP, has been conducted on indigenous languages. These languages are considered to be filled with complexities and challenges that make their study incredibly difficult in the NLP and AI fields. This paper focuses on the morphological segmentation of indigenous languages, an extremely challenging task because of polysynthesis, dialectal variations with rich morpho-phonemics, misspellings and resource-limited scenario issues. The proposed approach, towards a morphological segmentation of Innu-Aimun, an extremely low-resource indigenous language of Canada, is based on deep learning. Experiments and evaluations have shown promising results, compared to state-of-the-art rule-based and unsupervised approaches.- Anthology ID:
- 2022.deeplo-1.16
- Volume:
- Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing
- Month:
- July
- Year:
- 2022
- Address:
- Hybrid
- Editors:
- Colin Cherry, Angela Fan, George Foster, Gholamreza (Reza) Haffari, Shahram Khadivi, Nanyun (Violet) Peng, Xiang Ren, Ehsan Shareghi, Swabha Swayamdipta
- Venue:
- DeepLo
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 146–151
- Language:
- URL:
- https://aclanthology.org/2022.deeplo-1.16
- DOI:
- 10.18653/v1/2022.deeplo-1.16
- Cite (ACL):
- Ngoc Tan Le, Antoine Cadotte, Mathieu Boivin, Fatiha Sadat, and Jimena Terraza. 2022. Deep Learning-Based Morphological Segmentation for Indigenous Languages: A Study Case on Innu-Aimun. In Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing, pages 146–151, Hybrid. Association for Computational Linguistics.
- Cite (Informal):
- Deep Learning-Based Morphological Segmentation for Indigenous Languages: A Study Case on Innu-Aimun (Tan Le et al., DeepLo 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2022.deeplo-1.16.pdf