ItsAllGoodMan@LT-EDI-2025: Fusing TF-IDF and MuRIL Embeddings for Detecting Caste and Migration Hate Speech
Amritha Nandini K L, Vishal S, Giri Prasath R, Anerud Thiyagarajan, Sachin Kumar S
Abstract
Caste and migration hate speech detection is a critical task in the context of increasingly multilingual and diverse online discourse. In this work, we address the problem of identifying hate speech targeting caste and migrant communities across a multilingual social media dataset containing Tamil, Tamil written in English script, and English. We explore and compare different feature representations, including TF-IDF vectors and embeddings from pretrained transformer-based models, to train various machine learning classifiers. Our experiments show that a Soft Voting Classifier that make use of both TF-IDF vectors and MuRIL embeddings performs best, achieving a macro F1 score of 0.802 on the test set. This approach was evaluated as part of the Shared Task on Caste and Migration Hate Speech Detection at LT-EDI@LDK 2025, where it ranked 6th overall.- Anthology ID:
- 2025.ltedi-1.15
- Volume:
- Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion
- Month:
- September
- Year:
- 2025
- Address:
- Naples, Italy
- Editors:
- Katerina Gkirtzou, Slavko Žitnik, Jorge Gracia, Dagmar Gromann, Maria Pia di Buono, Johanna Monti, Maxim Ionov
- Venues:
- LTEDI | WS
- SIG:
- Publisher:
- Unior Press
- Note:
- Pages:
- 90–94
- Language:
- URL:
- https://preview.aclanthology.org/corrections-2025-10/2025.ltedi-1.15/
- DOI:
- Cite (ACL):
- Amritha Nandini K L, Vishal S, Giri Prasath R, Anerud Thiyagarajan, and Sachin Kumar S. 2025. ItsAllGoodMan@LT-EDI-2025: Fusing TF-IDF and MuRIL Embeddings for Detecting Caste and Migration Hate Speech. In Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion, pages 90–94, Naples, Italy. Unior Press.
- Cite (Informal):
- ItsAllGoodMan@LT-EDI-2025: Fusing TF-IDF and MuRIL Embeddings for Detecting Caste and Migration Hate Speech (L et al., LTEDI 2025)
- PDF:
- https://preview.aclanthology.org/corrections-2025-10/2025.ltedi-1.15.pdf