Abstract
Parts of speech (POS) tagging is the process of assigning the part of speech tag to each and every word in a sentence. In this paper, we have presented POS tagger for Kannada, a low resource south Asian language, using Condition Random Fields. POS tagger developed in the work uses novel features native to Kannada language. The novel features include Sandhi splitting, where a compound word is broken down into two or more meaningful constituent words. The proposed model is trained and tested on the tagged dataset which contains 21 thousand sentences and achieves a highest accuracy of 94.56%.- Anthology ID:
- R19-2005
- Volume:
- Proceedings of the Student Research Workshop Associated with RANLP 2019
- Month:
- September
- Year:
- 2019
- Address:
- Varna, Bulgaria
- Editors:
- Venelin Kovatchev, Irina Temnikova, Branislava Šandrih, Ivelina Nikolova
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 28–31
- Language:
- URL:
- https://aclanthology.org/R19-2005
- DOI:
- 10.26615/issn.2603-2821.2019_005
- Cite (ACL):
- Swaroop L R, Rakshith Gowda G S, Sourabh U, and Shriram Hegde. 2019. Parts of Speech Tagging for Kannada. In Proceedings of the Student Research Workshop Associated with RANLP 2019, pages 28–31, Varna, Bulgaria. INCOMA Ltd..
- Cite (Informal):
- Parts of Speech Tagging for Kannada (L R et al., RANLP 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/R19-2005.pdf