Overcoming Resistance: The Normalization of an Amazonian Tribal Language

John E Ortega, Richard Alexander Castro-Mamani, Jaime Rafael Montoya Samame


Abstract
Languages can be considered endangered for many reasons. One of the principal reasons for endangerment is the disappearance of its speakers. Another, more identifiable reason, is the lack of written resources. We present an automated sub-segmentation system called AshMorph that deals with the morphology of an Amazonian tribal language called Ashaninka which is at risk of being endangered due to the lack of availability (or resistance) of native speakers and the absence of written resources. We show that by the use of a cross-lingual lexicon and finite state transducers we can increase accuracy by more than 30% when compared to other modern sub-segmentation tools. Our results, made freely available on-line, are verified by an Ashaninka speaker and perform well in two distinct domains, everyday literary articles and the bible. This research serves as a first step in helping to preserve Ashaninka by offering a sub-segmentation process that can be used to normalize any Ashaninka text which will serve as input to a machine translation system for translation into other high-resource languages spoken by higher populated locations like Spanish and Portuguese in the case of Peru and Brazil where Ashaninka is mostly spoken.
Anthology ID:
2020.loresmt-1.1
Volume:
Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages
Month:
December
Year:
2020
Address:
Suzhou, China
Editors:
Alina Karakanta, Atul Kr. Ojha, Chao-Hong Liu, Jade Abbott, John Ortega, Jonathan Washington, Nathaniel Oco, Surafel Melaku Lakew, Tommi A Pirinen, Valentin Malykh, Varvara Logacheva, Xiaobing Zhao
Venue:
LoResMT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–13
Language:
URL:
https://aclanthology.org/2020.loresmt-1.1
DOI:
Bibkey:
Cite (ACL):
John E Ortega, Richard Alexander Castro-Mamani, and Jaime Rafael Montoya Samame. 2020. Overcoming Resistance: The Normalization of an Amazonian Tribal Language. In Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages, pages 1–13, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Overcoming Resistance: The Normalization of an Amazonian Tribal Language (Ortega et al., LoResMT 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2020.loresmt-1.1.pdf