A finite-state morphological analyser for Paraguayan Guaraní

Anastasia Kuznetsova, Francis Tyers


Abstract
This article describes the development of morphological analyser for Paraguayan Guaraní, agglutinative indigenous language spoken by nearly 6 million people in South America. The implementation of our analyser uses HFST (Helsiki Finite State Technology) and two-level transducer that covers morphotactics and phonological processes occurring in Guaraní. We assess the efficacy of the approach on publicly available Wikipedia and Bible corpora and the naive coverage of analyser reaches 86% on Wikipedia and 91% on Bible corpora.
Anthology ID:
2021.americasnlp-1.9
Volume:
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas
Month:
June
Year:
2021
Address:
Online
Editors:
Manuel Mager, Arturo Oncevay, Annette Rios, Ivan Vladimir Meza Ruiz, Alexis Palmer, Graham Neubig, Katharina Kann
Venue:
AmericasNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
81–89
Language:
URL:
https://aclanthology.org/2021.americasnlp-1.9
DOI:
10.18653/v1/2021.americasnlp-1.9
Bibkey:
Cite (ACL):
Anastasia Kuznetsova and Francis Tyers. 2021. A finite-state morphological analyser for Paraguayan Guaraní. In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pages 81–89, Online. Association for Computational Linguistics.
Cite (Informal):
A finite-state morphological analyser for Paraguayan Guaraní (Kuznetsova & Tyers, AmericasNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2021.americasnlp-1.9.pdf