Fine-tuning XLM-RoBERTa for Named Entity Recognition in Kurmanji Kurdish

Hossein Hassani


Abstract
Named Entity Recognition (NER) is the information extraction task of identifying predefined named entities such as person names, location names, organization names and more. High-resource languages have made significant progress in NER tasks. However, low-resource languages such as Kurmanji Kurdish have not seen the same advancements, due to these languages having less available data online. This research aims to close this gap by developing an NER system via fine-tuning XLM-RoBERTa on a manually annotated dataset for Kurmanji. The dataset used for fine-tuning consists of 7,919 annotated sentences, which were manually annotated by three native Kurmanji speakers. The classes labeled in the dataset are Person (PER), Organization (ORG), and Location (LOC). A web-based application has also been developed using Streamlit to make the model more accessible. The model achieved an F1 score of 0.8735, precision of 0.8668, and recall of 0.8803, demonstrating the effectiveness of fine-tuning transformer-based models for NER tasks in low-resource languages. This work establishes a methodology that can be applied to other low-resource languages and Kurdish varieties.
Anthology ID:
2025.winlp-main.8
Volume:
Proceedings of the 9th Widening NLP Workshop
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Chen Zhang, Emily Allaway, Hua Shen, Lesly Miculicich, Yinqiao Li, Meryem M'hamdi, Peerat Limkonchotiwat, Richard He Bai, Santosh T.y.s.s., Sophia Simeng Han, Surendrabikram Thapa, Wiem Ben Rim
Venues:
WiNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
41–45
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.winlp-main.8/
DOI:
Bibkey:
Cite (ACL):
Hossein Hassani. 2025. Fine-tuning XLM-RoBERTa for Named Entity Recognition in Kurmanji Kurdish. In Proceedings of the 9th Widening NLP Workshop, pages 41–45, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Fine-tuning XLM-RoBERTa for Named Entity Recognition in Kurmanji Kurdish (Hassani, WiNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.winlp-main.8.pdf