Abstract
The emergence of social media and e-commerce platforms enabled the perpetrator to spread negativity and abuse individuals or organisations worldwide rapidly. It is critical to detect hate speech in both visual and textual content so that it may be moderated or excluded from online platforms to keep it sound and safe for users. However, multimodal hate speech detection is a complex and challenging task as people sarcastically present hate speech and different modalities i.e., image and text are involved in their content. This paper describes our participation in the CASE 2023 multimodal hate speech event detection task. In this task, the objective is to automatically detect hate speech and its target from the given text-embedded image. We proposed a transformer-based multimodal hierarchical fusion model to detect hate speech present in the visual content. We jointly fine-tune a language and a vision pre-trained transformer models to extract the visual-contextualized features representation of the text-embedded image. We concatenate these features and fed them to the multi-sample dropout strategy. Moreover, the contextual feature vector is fed into the BiLSTM module and the output of the BiLSTM module also passes into the multi-sample dropout. We employed arithmetic mean fusion to fuse all sample dropout outputs that predict the final label of our proposed method. Experimental results demonstrate that our model obtains competitive performance and ranked 5th among the participants- Anthology ID:
- 2023.case-1.14
- Volume:
- Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text
- Month:
- sEPTEMBER
- Year:
- 2023
- Address:
- Varna, Bulgaria
- Editors:
- Ali Hürriyetoğlu, Hristo Tanev, Vanni Zavarella, Reyyan Yeniterzi, Erdem Yörük, Milena Slavcheva
- Venues:
- CASE | WS
- SIG:
- Publisher:
- INCOMA Ltd., Shoumen, Bulgaria
- Note:
- Pages:
- 101–107
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/2023.case-1.14/
- DOI:
- Cite (ACL):
- Abdul Aziz, MD. Akram Hossain, and Abu Nowshed Chy. 2023. CSECU-DSG@Multimodal Hate Speech Event Detection 2023: Transformer-based Multimodal Hierarchical Fusion Model For Multimodal Hate Speech Detection. In Proceedings of the 6th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text, pages 101–107, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
- Cite (Informal):
- CSECU-DSG@Multimodal Hate Speech Event Detection 2023: Transformer-based Multimodal Hierarchical Fusion Model For Multimodal Hate Speech Detection (Aziz et al., CASE 2023)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/2023.case-1.14.pdf