Abhinav Reddy Appidi
2020
Creation of Corpus and Analysis in Code-Mixed Kannada-English Social Media Data for POS Tagging
Abhinav Reddy Appidi
|
Vamshi Krishna Srirangam
|
Darsi Suhas
|
Manish Shrivastava
Proceedings of the 17th International Conference on Natural Language Processing (ICON)
Part-of-Speech (POS) is one of the essential tasks for many Natural Language Processing (NLP) applications. There has been a significant amount of work done in POS tagging for resource-rich languages. POS tagging is an essential phase of text analysis in understanding the semantics and context of language. These tags are useful for higher-level tasks such as building parse trees, which can be used for Named Entity Recognition, Coreference resolution, Sentiment Analysis, and Question Answering. There has been work done on code-mixed social media corpus but not on POS tagging of Kannada-English code-mixed data. Here, we present Kannada-English code- mixed social media corpus annotated with corresponding POS tags. We also experimented with machine learning classification models CRF, Bi-LSTM, and Bi-LSTM-CRF models on our corpus.
Creation of Corpus and analysis in Code-Mixed Kannada-English Twitter data for Emotion Prediction
Abhinav Reddy Appidi
|
Vamshi Krishna Srirangam
|
Darsi Suhas
|
Manish Shrivastava
Proceedings of the 28th International Conference on Computational Linguistics
Emotion prediction is a critical task in the field of Natural Language Processing (NLP). There has been a significant amount of work done in emotion prediction for resource-rich languages. There has been work done on code-mixed social media corpus but not on emotion prediction of Kannada-English code-mixed Twitter data. In this paper, we analyze the problem of emotion prediction on corpus obtained from code-mixed Kannada-English extracted from Twitter annotated with their respective ‘Emotion’ for each tweet. We experimented with machine learning prediction models using features like Character N-Grams, Word N-Grams, Repetitive characters, and others on SVM and LSTM on our corpus, which resulted in an accuracy of 30% and 32% respectively.
Search