Abstract
Code-mixed text sequences often lead to challenges in the task of correct identification of Part-Of-Speech tags. However, lexical dependencies created while alternating between multiple languages can be leveraged to improve the performance of such tasks. Indian languages with rich morphological structure and highly inflected nature provide such an opportunity. In this work, we exploit these sub-label dependencies using conditional random fields (CRFs) by defining feature extraction functions on three distinct language pairs (Hindi-English, Bengali-English, and Telugu-English). Our results demonstrate a significant increase in the tagging performance if the feature extraction functions employ the rich inner structure of such languages.- Anthology ID:
- 2022.wildre-1.3
- Volume:
- Proceedings of the WILDRE-6 Workshop within the 13th Language Resources and Evaluation Conference
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Girish Nath Jha, Sobha L., Kalika Bali, Atul Kr. Ojha
- Venue:
- WILDRE
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 13–17
- Language:
- URL:
- https://aclanthology.org/2022.wildre-1.3
- DOI:
- Cite (ACL):
- Akash Kumar Gautam. 2022. Leveraging Sub Label Dependencies in Code Mixed Indian Languages for Part-Of-Speech Tagging using Conditional Random Fields.. In Proceedings of the WILDRE-6 Workshop within the 13th Language Resources and Evaluation Conference, pages 13–17, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Leveraging Sub Label Dependencies in Code Mixed Indian Languages for Part-Of-Speech Tagging using Conditional Random Fields. (Gautam, WILDRE 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2022.wildre-1.3.pdf