Nisheeth Joshi


2021

pdf
Part of Speech Tagging for a Resource Poor Language : Sindhi in Devanagari Script using HMM and CRF
Bharti Nathani | Nisheeth Joshi
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

Part of speech tagging is a pre-processing step of various NLP applications. Mainly it is used in Machine Translation. This research proposes two POS taggers, i.e., an HMM-based and CRF based tagger. To develop this tagger, the corpus of manually annotated 30,000 sentences has been prepared with the help of language experts. In this paper, we have developed POS taggers for Sindhi Language (in Devanagari Script), a resource poor language, using HMM (Hidden Markov Model) and Conditional Random Field (CRF).Evaluation results demonstrated the accuracies of 76.60714% and 88.79% in the HMM, and CRF, respectively.
Search
Co-authors
Venues