Hrushikesh Patil


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2022

pdf bib
L3Cube-MahaHate: A Tweet-based Marathi Hate Speech Detection Dataset and BERT Models
Hrushikesh Patil | Abhishek Velankar | Raviraj Joshi
Proceedings of the Third Workshop on Threat, Aggression and Cyberbullying (TRAC 2022)

Social media platforms are used by a large number of people prominently to express their thoughts and opinions. However, these platforms have contributed to a sub stantial amount of hateful and abusive content as well. Therefore, it is impor tant to curb the spread of hate speech on these platforms. In India, Marathi is one of the most popular languages used by a wide audience. In this work, we present L3Cube-MahaHate, the first ma jor Hate Speech Dataset in Marathi. The dataset is curated from Twitter, anno tated manually. Our dataset consists of over 00 distinct tweets labeled into four major classes i.e hate, offensive, pro fane, and not. We present the approaches used for collecting and annotating the data and the challenges faced during the pro cess. Finally, we present baseline classi fication results using deep learning mod els based on CNN, LSTM, and Transform ers. We explore mono-lingual and multi lingual variants of BERT like MahaBERT, IndicBERT, mBERT, and xlm-RoBERTa and show that mono-lingual models per form better than their multi-lingual coun terparts. The MahaBERT model provides the best results on L3Cube-MahaHate Corpus.