Part-of-Speech Tagging for Northern Kurdish

Peshmerge Morad; Sina Ahmadi; Lorenzo Gatti

Part-of-Speech Tagging for Northern Kurdish

Peshmerge Morad, Sina Ahmadi, Lorenzo Gatti

Abstract

In the growing domain of natural language processing, low-resourced languages like Northern Kurdish remain largely unexplored due to the lack of resources needed to be part of this growth. In particular, the tasks of part-of-speech tagging and tokenization for Northern Kurdish are still insufficiently addressed. In this study, we aim to bridge this gap by evaluating a range of statistical, neural, and fine-tuned-based models specifically tailored for Northern Kurdish. Leveraging limited but valuable datasets, including the Universal Dependency Kurmanji treebank and a novel manually annotated and tokenized gold-standard dataset consisting of 136 sentences (2,937 tokens). We evaluate several POS tagging models and report that the fine-tuned transformer-based model outperforms others, achieving an accuracy of 0.87 and a macro-averaged F1 score of 0.77. Data and models are publicly available under an open license at https://github.com/peshmerge/northern-kurdish-pos-tagging

Anthology ID:: 2024.mwe-1.11
Volume:: Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Archna Bhatia, Gosse Bouma, A. Seza Doğruöz, Kilian Evang, Marcos Garcia, Voula Giouli, Lifeng Han, Joakim Nivre, Alexandre Rademaker
Venues:: MWE | UDW | WS
SIGs:: SIGPARSE | SIGLEX
Publisher:: ELRA and ICCL
Note:
Pages:: 70–80
Language:
URL:: https://preview.aclanthology.org/remove-affiliations/2024.mwe-1.11/
DOI:
Bibkey:
Cite (ACL):: Peshmerge Morad, Sina Ahmadi, and Lorenzo Gatti. 2024. Part-of-Speech Tagging for Northern Kurdish. In Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, pages 70–80, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Part-of-Speech Tagging for Northern Kurdish (Morad et al., MWE-UDW 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/remove-affiliations/2024.mwe-1.11.pdf

PDF Search Fix data