Sunil Saumya

2024

pdf abs
IIITDWD_SVC@DravidianLangTech-2024: Breaking Language Barriers; Hate Speech Detection in Telugu-English Code-Mixed Text
Chava Sai | Rangoori Kumar | Sunil Saumya | Shankar Biradar
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Social media platforms have become increasingly popular and are utilized for a wide range of purposes, including product promotion, news sharing, accomplishment sharing, and much more. However, it is also employed for defamatory speech, intimidation, and the propagation of untruths about particular groups of people. Further, hateful and offensive posts spread quickly and often have a negative impact on people; it is important to identify and remove them from social media platforms as soon as possible. Over the past few years, research on hate speech detection and offensive content has grown in popularity. One of the many difficulties in identifying hate speech on social media platforms is the use of code-mixed language. The majority of people who use social media typically share their messages in languages with mixed codes, like Telugu–English. To encourage research in this direction, the organizers of DravidianLangTech@EACL-2024 conducted a shared task to identify hateful content in Telugu-English code-mixed text. Our team participated in this shared task, employing three different models: Xlm-Roberta, BERT, and Hate-BERT. In particular, our BERT-based model secured the 14th rank in the competition with a macro F1 score of 0.65.

pdf abs
IIITDWD-zk@DravidianLangTech-2024: Leveraging the Power of Language Models for Hate Speech Detection in Telugu-English Code-Mixed Text
Zuhair Shaik | Sai Kartheek Reddy Kasu | Sunil Saumya | Shankar Biradar
Proceedings of the Fourth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Hateful online content is a growing concern, especially for young people. While social media platforms aim to connect us, they can also become breeding grounds for negativity and harmful language. This study tackles this issue by proposing a novel framework called HOLD-Z, specifically designed to detect hate and offensive comments in Telugu-English code-mixed social media content. HOLD-Z leverages a combination of approaches, including three powerful models: LSTM architecture, Zypher, and openchat_3.5. The study highlights the effectiveness of prompt engineering and Quantized Low-Rank Adaptation (QLoRA) in boosting performance. Notably, HOLD-Z secured the 9th place in the prestigious HOLD-Telugu DravidianLangTech@EACL-2024 shared task, showcasing its potential for tackling the complexities of hate and offensive comment classification.

2023

pdf abs
IIITDWD@LT-EDI-2023 Unveiling Depression: Using pre-trained language models for Harnessing Domain-Specific Features and Context Information
Shankar Biradar | Sunil Saumya | Sanjana Kavatagi
Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion

Depression has become a common health problem impacting millions of individuals globally. Workplace stress and an unhealthy lifestyle have increased in recent years, leading to an increase in the number of people experiencing depressive symptoms. The spread of the epidemic has further exacerbated the problem. Early detection and precise prediction of depression are critical for early intervention and support for individuals at risk. However, due to the social stigma associated with the illness, many people are afraid to consult healthcare specialists, making early detection practically impossible. As a result, alternative strategies for depression prediction are being investigated, one of which is analyzing users’ social media posting behaviour. The organizers of LT-EDI@RANLP carried out a shared Task to encourage research in this area. Our team participated in the shared task and secured 21st rank with a macro F1 score 0f 0.36. This article provides a summary of the model presented in the shared task.

2022

pdf abs
IIITDWD@TamilNLP-ACL2022: Transformer-based approach to classify abusive content in Dravidian Code-mixed text
Shankar Biradar | Sunil Saumya
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages

Identifying abusive content or hate speech in social media text has raised the research community’s interest in recent times. The major driving force behind this is the widespread use of social media websites. Further, it also leads to identifying abusive content in low-resource regional languages, which is an important research problem in computational linguistics. As part of ACL-2022, organizers of DravidianLangTech@ACL 2022 have released a shared task on abusive category identification in Tamil and Tamil-English code-mixed text to encourage further research on offensive content identification in low-resource Indic languages. This paper presents the working notes for the model submitted by IIITDWD at DravidianLangTech@ACL 2022. Our team competed in Sub-Task B and finished in 9th place among the participating teams. In our proposed approach, we used a pre-trained transformer model such as Indic-bert for feature extraction, and on top of that, SVM classifier is used for stance detection. Further, our model achieved 62 % accuracy on code-mixed Tamil-English text.

pdf abs
CURAJ_IIITDWD@LT-EDI-ACL 2022: Hope Speech Detection in English YouTube Comments using Deep Learning Techniques
Vanshita Jha | Ankit Mishra | Sunil Saumya
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

Hope Speech are positive terms that help to promote or criticise a point of view without hurting the user’s or community’s feelings. Non-Hope Speech, on the other side, includes expressions that are harsh, ridiculing, or demotivating. The goal of this article is to find the hope speech comments in a YouTube dataset. The datasets were created as part of the “LT-EDI-ACL 2022: Hope Speech Detection for Equality, Diversity, and Inclusion” shared task. The shared task dataset was proposed in Malayalam, Tamil, English, Spanish, and Kannada languages. In this paper, we worked at English-language YouTube comments. We employed several deep learning based models such as DNN (dense or fully connected neural network), CNN (Convolutional Neural Network), Bi-LSTM (Bidirectional Long Short Term Memory Network), and GRU(Gated Recurrent Unit) to identify the hopeful comments. We also used Stacked LSTM-CNN and Stacked LSTM-LSTM network to train the model. The best macro average F1-score 0.67 for development dataset was obtained using the DNN model. The macro average F1-score of 0.67 was achieved for the classification done on the test data as well.

pdf abs
SOA_NLP@LT-EDI-ACL2022: An Ensemble Model for Hope Speech Detection from YouTube Comments
Abhinav Kumar | Sunil Saumya | Pradeep Roy
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

Language should be accommodating of equality and diversity as a fundamental aspect of communication. The language of internet users has a big impact on peer users all over the world. On virtual platforms such as Facebook, Twitter, and YouTube, people express their opinions in different languages. People respect others’ accomplishments, pray for their well-being, and cheer them on when they fail. Such motivational remarks are hope speech remarks. Simultaneously, a group of users encourages discrimination against women, people of color, people with disabilities, and other minorities based on gender, race, sexual orientation, and other factors. To recognize hope speech from YouTube comments, the current study offers an ensemble approach that combines a support vector machine, logistic regression, and random forest classifiers. Extensive testing was carried out to discover the best features for the aforementioned classifiers. In the support vector machine and logistic regression classifiers, char-level TF-IDF features were used, whereas in the random forest classifier, word-level features were used. The proposed ensemble model performed significantly well among English, Spanish, Tamil, Malayalam, and Kannada YouTube comments.

pdf abs
Are you a hero or a villain? A semantic role labelling approach for detecting harmful memes.
Shaik Fharook | Syed Sufyan Ahmed | Gurram Rithika | Sumith Sai Budde | Sunil Saumya | Shankar Biradar
Proceedings of the Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situations

Identifying good and evil through representations of victimhood, heroism, and villainy (i.e., role labeling of entities) has recently caught the research community’s interest. Because of the growing popularity of memes, the amount of offensive information published on the internet is expanding at an alarming rate. It generated a larger need to address this issue and analyze the memes for content moderation. Framing is used to show the entities engaged as heroes, villains, victims, or others so that readers may better anticipate and understand their attitudes and behaviors as characters. Positive phrases are used to characterize heroes, whereas negative terms depict victims and villains, and terms that tend to be neutral are mapped to others. In this paper, we propose two approaches to role label the entities of the meme as hero, villain, victim, or other through Named-Entity Recognition(NER), Sentiment Analysis, etc. With an F1-score of 23.855, our team secured eighth position in the Shared Task @ Constraint 2022.

2021

pdf abs
IIIT_DWD@LT-EDI-EACL2021: Hope Speech Detection in YouTube multilingual comments
Sunil Saumya | Ankit Kumar Mishra
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion

Language as a significant part of communication should be inclusive of equality and diversity. The internet user’s language has a huge influence on peer users all over the world. People express their views through language on virtual platforms like Facebook, Twitter, YouTube etc. People admire the success of others, pray for their well-being, and encourage on their failure. Such inspirational comments are hope speech comments. At the same time, a group of users promotes discrimination based on gender, racial, sexual orientation, persons with disability, and other minorities. The current paper aims to identify hope speech comments which are very important to move on in life. Various machine learning and deep learning based models (such as support vector machine, logistics regression, convolutional neural network, recurrent neural network) are employed to identify the hope speech in the given YouTube comments. The YouTube comments are available in English, Tamil and Malayalam languages and are part of the task “EACL-2021:Hope Speech Detection for Equality, Diversity and Inclusion”.

pdf abs
Offensive language identification in Dravidian code mixed social media text
Sunil Saumya | Abhinav Kumar | Jyoti Prakash Singh
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

Hate speech and offensive language recognition in social media platforms have been an active field of research over recent years. In non-native English spoken countries, social media texts are mostly in code mixed or script mixed/switched form. The current study presents extensive experiments using multiple machine learning, deep learning, and transfer learning models to detect offensive content on Twitter. The data set used for this study are in Tanglish (Tamil and English), Manglish (Malayalam and English) code-mixed, and Malayalam script-mixed. The experimental results showed that 1 to 6-gram character TF-IDF features are better for the said task. The best performing models were naive bayes, logistic regression, and vanilla neural network for the dataset Tamil code-mix, Malayalam code-mixed, and Malayalam script-mixed, respectively instead of more popular transfer learning models such as BERT and ULMFiT and hybrid deep models.

pdf abs
IIIT_DWD@EACL2021: Identifying Troll Meme in Tamil using a hybrid deep learning approach
Ankit Kumar Mishra | Sunil Saumya
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

Social media are an open forum that allows people to share their knowledge, abilities, talents, ideas, or expressions. Simultaneously, it also allows people to post disrespectful, trolling, defamation, or negative content targeting users or the community based on their gender, race, religious beliefs, etc. Such posts are available in the form of text, image, video, and meme. Among them, memes are currently widely used to disseminate offensive material amongst people. It is primarily in the form of pictures and text. In the present paper, troll memes are identified, which is necessary to create a healthy society. To do so, a hybrid deep learning model combining convolutional neural networks and bidirectional long short term memory is proposed to identify trolled memes. The dataset used in the study is a part of the competition EACL 2021: Troll Meme classification in Tamil. The proposed model obtained 10th rank in the competition and reported a precision of 0.52, recall 0.59, and weighted F10.3.