Parts of Speech Tagging for Kannada

Swaroop L R, Rakshith Gowda G S, Sourabh U, Shriram Hegde


Abstract
Parts of speech (POS) tagging is the process of assigning the part of speech tag to each and every word in a sentence. In this paper, we have presented POS tagger for Kannada, a low resource south Asian language, using Condition Random Fields. POS tagger developed in the work uses novel features native to Kannada language. The novel features include Sandhi splitting, where a compound word is broken down into two or more meaningful constituent words. The proposed model is trained and tested on the tagged dataset which contains 21 thousand sentences and achieves a highest accuracy of 94.56%.
Anthology ID:
R19-2005
Volume:
Proceedings of the Student Research Workshop Associated with RANLP 2019
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Editors:
Venelin Kovatchev, Irina Temnikova, Branislava Šandrih, Ivelina Nikolova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
28–31
Language:
URL:
https://aclanthology.org/R19-2005
DOI:
10.26615/issn.2603-2821.2019_005
Bibkey:
Cite (ACL):
Swaroop L R, Rakshith Gowda G S, Sourabh U, and Shriram Hegde. 2019. Parts of Speech Tagging for Kannada. In Proceedings of the Student Research Workshop Associated with RANLP 2019, pages 28–31, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Parts of Speech Tagging for Kannada (L R et al., RANLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp22-frontmatter/R19-2005.pdf