Part-of-Speech Annotation Challenges in Marathi
Gajanan Rane, Nilesh Joshi, Geetanjali Rane, Hanumant Redkar, Malhar Kulkarni, Pushpak Bhattacharyya
Abstract
Part of Speech (POS) annotation is a significant challenge in natural language processing. The paper discusses issues and challenges faced in the process of POS annotation of the Marathi data from four domains viz., tourism, health, entertainment and agriculture. During POS annotation, a lot of issues were encountered. Some of the major ones are discussed in detail in this paper. Also, the two approaches viz., the lexical (L approach) and the functional (F approach) of POS tagging have been discussed and presented with examples. Further, some ambiguous cases in POS annotation are presented in the paper.- Anthology ID:
- 2020.wildre-1.1
- Volume:
- Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Editors:
- Girish Nath Jha, Kalika Bali, Sobha L., S. S. Agrawal, Atul Kr. Ojha
- Venue:
- WILDRE
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 1–6
- Language:
- English
- URL:
- https://aclanthology.org/2020.wildre-1.1
- DOI:
- Cite (ACL):
- Gajanan Rane, Nilesh Joshi, Geetanjali Rane, Hanumant Redkar, Malhar Kulkarni, and Pushpak Bhattacharyya. 2020. Part-of-Speech Annotation Challenges in Marathi. In Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation, pages 1–6, Marseille, France. European Language Resources Association (ELRA).
- Cite (Informal):
- Part-of-Speech Annotation Challenges in Marathi (Rane et al., WILDRE 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/2020.wildre-1.1.pdf