KLPTKurdish Language Processing Toolkit

Sina Ahmadi


Abstract
Despite the recent advances in applying language-independent approaches to various natural language processing tasks thanks to artificial intelligence, some language-specific tools are still essential to process a language in a viable manner. Kurdish language is a less-resourced language with a remarkable diversity in dialects and scripts and lacks basic language processing tools. To address this issue, we introduce a language processing toolkit to handle such a diversity in an efficient way. Our toolkit is composed of fundamental components such as text preprocessing, stemming, tokenization, lemmatization and transliteration and is able to get further extended by future developers. The project is publicly available.
Anthology ID:
2020.nlposs-1.11
Volume:
Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)
Month:
November
Year:
2020
Address:
Online
Venues:
EMNLP | NLPOSS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
72–84
Language:
URL:
https://aclanthology.org/2020.nlposs-1.11
DOI:
10.18653/v1/2020.nlposs-1.11
Bibkey:
Cite (ACL):
Sina Ahmadi. 2020. KLPT – Kurdish Language Processing Toolkit. In Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS), pages 72–84, Online. Association for Computational Linguistics.
Cite (Informal):
KLPT – Kurdish Language Processing Toolkit (Ahmadi, NLPOSS 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2020.nlposs-1.11.pdf
Video:
 https://slideslive.com/38939750