Sai Prashanth Karnati
2024
Findings of the Shared Task on Hate and Offensive Language Detection in Telugu Codemixed Text (HOLD-Telugu)@DravidianLangTech 2024
Premjith B
|
Bharathi Raja Chakravarthi
|
Prasanna Kumar Kumaresan
|
Saranya Rajiakodi
|
Sai Prashanth Karnati
|
Sai Rishith Reddy Mangamuru
|
Chandu Janakiram
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
This paper examines the submissions of various participating teams to the task on Hate and Offensive Language Detection in Telugu Codemixed Text (HOLD-Telugu) organized as part of DravidianLangTech 2024. In order to identify the contents containing harmful information in Telugu codemixed social media text, the shared task pushes researchers and academicians to build models. The dataset for the task was created by gathering YouTube comments and annotated manually. A total of 23 teams participated and submitted their results to the shared task. The rank list was created by assessing the submitted results using the macro F1-score.
2023
Enhancing Telugu Part-of-Speech Tagging with Deep Sequential Models and Multilingual Embeddings
Sai Rishith Reddy Mangamuru
|
Sai Prashanth Karnati
|
Bala Karthikeya Sajja
|
Divith Phogat
|
Premjith B.
Proceedings of the 20th International Conference on Natural Language Processing (ICON)
Part-of-speech (POS) tagging is a fundamental task in natural language processing (NLP) that involves assigning grammatical categories to words in a sentence. In this study, we investigate the application of deep sequential models for POS tagging of Telugu, a low-resource Dravidian language with rich morphology. We use the Universal dependencies dataset for this research and explore various deep learning architectures, including Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, Gated Recurrent Units (GRUs), and their stacked variants for POS tagging. Additionally, we utilize multilingual BERT embeddings and indicBERT embeddings to capture contextual information from the input sequences. Our experiments demonstrate that stacked LSTM with multilingual BERT embeddings achieves the highest performance, outperforming other approaches and attaining an F1 score of 0.8812. These findings suggest that deep sequential models, particularly stacked LSTMs with multilingual BERT embeddings, are effective tools for POS tagging in Telugu.