Arunima S.
Also published as: Arunima S
2021
ssn_diBERTsity@LT-EDI-EACL2021:Hope Speech Detection on multilingual YouTube comments via transformer based approach
Arunima S
|
Akshay Ramakrishnan
|
Avantika Balaji
|
Thenmozhi D.
|
Senthil Kumar B
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion
In recent times, there exists an abundance of research to classify abusive and offensive texts focusing on negative comments but only minimal research using the positive reinforcement approach. The task was aimed at classifying texts into ‘Hope_speech’, ‘Non_hope_speech’, and ‘Not in language’. The datasets were provided by the LT-EDI organisers in English, Tamil, and Malayalam language with texts sourced from YouTube comments. We trained our data using transformer models, specifically mBERT for Tamil and Malayalam and BERT for English, and achieved weighted average F1-scores of 0.46, 0.81, 0.92 for Tamil, Malayalam, and English respectively.
2020
Ssn_nlp at SemEval 2020 Task 12: Offense Target Identification in Social Media Using Traditional and Deep Machine Learning Approaches
Thenmozhi D.
|
Nandhinee P.r.
|
Arunima S.
|
Amlan Sengupta
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Offensive language identification (OLI) in user generated text is automatic detection of any profanity, insult, obscenity, racism or vulgarity that is addressed towards an individual or a group. Due to immense growth and usage of social media, it has an extensive reach and impact on the society. OLI is helpful for hate speech detection, flame detection and cyber bullying, hence it is used to avoid abuse and hurts. In this paper, we present state of the art machine learning approaches for OLI. We follow several approaches which include classifiers like Naive Bayes, Support Vector Machine(SVM) and deep learning approaches like Recurrent Neural Network(RNN) and Masked LM (MLM). The approaches are evaluated on the OffensEval@SemEval2020 dataset and our team ssn_nlp submitted runs for the third task of OffensEval shared task. The best run of ssn_nlp that uses BERT (Bidirectional Encoder Representations from Transformers) for the purpose of training the OLI model obtained F1 score as 0.61. The model performs with an accuracy of 0.80 and an evaluation loss of 1.0828. The model has a precision rate of 0.72 and a recall rate of 0.58.
Search
Co-authors
- Thenmozhi D. 2
- Akshay Ramakrishnan 1
- Avantika Balaji 1
- Senthil Kumar B. 1
- Nandhinee P.r. 1
- show all...