@inproceedings{selvamurugan-2025-dravlingua,
    title = "{D}rav{L}ingua@{D}ravidian{L}ang{T}ech 2025: Multimodal Hate Speech Detection in {D}ravidian Languages using Late Fusion of Muril and {W}av2{V}ec Models",
    author = "Selvamurugan, Aishwarya",
    editor = "Chakravarthi, Bharathi Raja  and
      Priyadharshini, Ruba  and
      Madasamy, Anand Kumar  and
      Thavareesan, Sajeetha  and
      Sherly, Elizabeth  and
      Rajiakodi, Saranya  and
      Palani, Balasubramanian  and
      Subramanian, Malliga  and
      Cn, Subalalitha  and
      Chinnappa, Dhivya",
    booktitle = "Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages",
    month = may,
    year = "2025",
    address = "Acoma, The Albuquerque Convention Center, Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2025.dravidianlangtech-1.118/",
    doi = "10.18653/v1/2025.dravidianlangtech-1.118",
    pages = "694--699",
    ISBN = "979-8-89176-228-2",
    abstract = "Detecting hate speech on social media is increasingly difficult, particularly in low-resource Dravidian languages such as Tamil, Telugu and Malayalam. Traditional approaches primarily rely on text-based classification, often overlooking the multimodal nature of online communication, where speech plays a pivotal role in spreading hate speech. We propose a multimodal hate speech detection model using a late fusion technique that integrates Wav2Vec 2.0 for speech processing and Muril for text analysis. Our model is evaluated on the DravidianLangTech@NAACL 2025 dataset, which contains speech and text data in Telugu, Tamil, and Malayalam scripts. The dataset is categorized into six classes: Non-Hate, Gender Hate, Political Hate, Religious Hate, Religious Defamation, and Personal Defamation. To address class imbalance, we incorporate class weighting and data augmentation techniques. Experimental results demonstrate that the late fusion approach effectively captures patterns of hate speech that may be missed when analyzing a single modality. This highlights the importance of multimodal strategies in enhancing hate speech detection, particularly for low-resource languages."
}