Abstract
This paper describes the development of a free/open-source finite-state morphologicaltransducer for Highland Puebla Nahuatl, a Uto-Aztecan language spoken in and around the stateof Puebla in Mexico. The finite-state toolkit used for the work is the Helsinki Finite-StateToolkit (HFST); we use the lexc formalism for modelling the morphotactics and twol formal-ism for modelling morphophonological alternations. An evaluation is presented which showsthat the transducer has a reasonable coveragearound 90%on freely-available corpora of the language, and high precisionover 95%on a manually verified test set- Anthology ID:
- 2023.americasnlp-1.12
- Volume:
- Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP)
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Manuel Mager, Abteen Ebrahimi, Arturo Oncevay, Enora Rice, Shruti Rijhwani, Alexis Palmer, Katharina Kann
- Venue:
- AmericasNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 103–108
- Language:
- URL:
- https://aclanthology.org/2023.americasnlp-1.12
- DOI:
- 10.18653/v1/2023.americasnlp-1.12
- Cite (ACL):
- Robert Pugh and Francis Tyers. 2023. A finite-state morphological analyser for Highland Puebla Nahuatl. In Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP), pages 103–108, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- A finite-state morphological analyser for Highland Puebla Nahuatl (Pugh & Tyers, AmericasNLP 2023)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/2023.americasnlp-1.12.pdf