Abstract
We study the performance of several popular neural part-of-speech taggers from the Universal Dependencies ecosystem on Mayan languages using a small corpus of 1435 annotated K’iche’ sentences consisting of approximately 10,000 tokens, with encouraging results: F1 scores 93%+ on lemmatisation, part-of-speech and morphological feature assignment. The high performance motivates a cross-language part-of-speech tagging study, where K’iche’-trained models are evaluated on two other Mayan languages, Kaqchikel and Uspanteko: performance on Kaqchikel is good, 63-85%, and on Uspanteko modest, 60-71%. Supporting experiments lead us to conclude the relative diversity of morphological features as a plausible explanation for the limiting factors in cross-language tagging performance, providing some direction for future sentence annotation and collection work to support these and other Mayan languages.- Anthology ID:
- 2021.americasnlp-1.6
- Volume:
- Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Editors:
- Manuel Mager, Arturo Oncevay, Annette Rios, Ivan Vladimir Meza Ruiz, Alexis Palmer, Graham Neubig, Katharina Kann
- Venue:
- AmericasNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 44–52
- Language:
- URL:
- https://aclanthology.org/2021.americasnlp-1.6
- DOI:
- 10.18653/v1/2021.americasnlp-1.6
- Cite (ACL):
- Francis Tyers and Nick Howell. 2021. A survey of part-of-speech tagging approaches applied to K’iche’. In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pages 44–52, Online. Association for Computational Linguistics.
- Cite (Informal):
- A survey of part-of-speech tagging approaches applied to K’iche’ (Tyers & Howell, AmericasNLP 2021)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2021.americasnlp-1.6.pdf