Annotating “Particles” in Multiword Expressions in te reo Māori for a Part-of-Speech Tagger
Aoife Finn, Suzanne Duncan, Peter-Lucas Jones, Gianna Leoni, Keoni Mahelona
Abstract
This paper discusses the development of a Part-of-Speech tagger for te reo Māori, which is the Indigenous language of Aotearoa, also known as New Zealand. Te reo Māori is a particularly analytical and polysemic language. A word class called “particles” is introduced, they are small multi-functional words with many meanings, for example ē, ai, noa, rawa, mai, anō and koa. These “particles” are reflective of the analytical and polysemous nature of te reo Māori. They frequently occur both singularly and also in multiword expressions, including time adverbial phrases. The paper illustrates the challenges that they presented to part-of-speech tagging. It also discusses how we overcome these challenges in a way that is appropriate for te reo Māori, given its status an Indigenous language and history of colonisation. This includes a discussion of the importance of accurately reflecting the conceptualization of te reo Māori. And how this involved making no linguistic presumptions, and of eliciting faithful judgements from speakers, in a way that is uninfluenced by linguistic terminology.- Anthology ID:
- 2022.mwe-1.10
- Volume:
- Proceedings of the 18th Workshop on Multiword Expressions @LREC2022
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Archna Bhatia, Paul Cook, Shiva Taslimipoor, Marcos Garcia, Carlos Ramisch
- Venue:
- MWE
- SIG:
- SIGLEX
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 67–74
- Language:
- URL:
- https://aclanthology.org/2022.mwe-1.10
- DOI:
- Cite (ACL):
- Aoife Finn, Suzanne Duncan, Peter-Lucas Jones, Gianna Leoni, and Keoni Mahelona. 2022. Annotating “Particles” in Multiword Expressions in te reo Māori for a Part-of-Speech Tagger. In Proceedings of the 18th Workshop on Multiword Expressions @LREC2022, pages 67–74, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Annotating “Particles” in Multiword Expressions in te reo Māori for a Part-of-Speech Tagger (Finn et al., MWE 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2022.mwe-1.10.pdf