Abstract
Despite the recent advances in applying language-independent approaches to various natural language processing tasks thanks to artificial intelligence, some language-specific tools are still essential to process a language in a viable manner. Kurdish language is a less-resourced language with a remarkable diversity in dialects and scripts and lacks basic language processing tools. To address this issue, we introduce a language processing toolkit to handle such a diversity in an efficient way. Our toolkit is composed of fundamental components such as text preprocessing, stemming, tokenization, lemmatization and transliteration and is able to get further extended by future developers. The project is publicly available.- Anthology ID:
- 2020.nlposs-1.11
- Volume:
- Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- NLPOSS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 72–84
- Language:
- URL:
- https://aclanthology.org/2020.nlposs-1.11
- DOI:
- 10.18653/v1/2020.nlposs-1.11
- Cite (ACL):
- Sina Ahmadi. 2020. KLPT – Kurdish Language Processing Toolkit. In Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS), pages 72–84, Online. Association for Computational Linguistics.
- Cite (Informal):
- KLPT – Kurdish Language Processing Toolkit (Ahmadi, NLPOSS 2020)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2020.nlposs-1.11.pdf