Named Entity Recognition for Code-Mixed Kannada-English Social Media Data

Poojitha Nandigam, Abhinav Appidi, Manish Shrivastava


Abstract
Named Entity Recognition (NER) is a critical task in the field of Natural Language Processing (NLP) and is also a sub-task of Information Extraction. There has been a significant amount of work done in entity extraction and Named Entity Recognition for resource-rich languages. Entity extraction from code-mixed social media data like tweets from twitter complicates the problem due to its unstructured, informal, and incomplete information available in tweets. Here, we present work on NER in Kannada-English code-mixed social media corpus with corresponding named entity tags referring to Organisation (Org), Person (Pers), and Location (Loc). We experimented with machine learning classification models like Conditional Random Fields (CRF), Bi-LSTM, and Bi-LSTM-CRF models on our corpus.
Anthology ID:
2022.icon-main.5
Volume:
Proceedings of the 19th International Conference on Natural Language Processing (ICON)
Month:
December
Year:
2022
Address:
New Delhi, India
Editors:
Md. Shad Akhtar, Tanmoy Chakraborty
Venue:
ICON
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
43–49
Language:
URL:
https://aclanthology.org/2022.icon-main.5
DOI:
Bibkey:
Cite (ACL):
Poojitha Nandigam, Abhinav Appidi, and Manish Shrivastava. 2022. Named Entity Recognition for Code-Mixed Kannada-English Social Media Data. In Proceedings of the 19th International Conference on Natural Language Processing (ICON), pages 43–49, New Delhi, India. Association for Computational Linguistics.
Cite (Informal):
Named Entity Recognition for Code-Mixed Kannada-English Social Media Data (Nandigam et al., ICON 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2022.icon-main.5.pdf