Jishnu Bandyopadhyay


2026

Social media is now an important platform for communication and interaction. At the same time, the amount of abusive and harmful content online has also increased. Offensive language and hate speech are making these platforms less safe and less welcoming for users. Many of these contents include homophobic and transphobic remarks aimed at the LGBT+ community. Such behaviour damages healthy discussions and can negatively affect users. For this reason, it is important to detect these contents early so they can be flagged and removed to maintain a healthy online well-being. The issue becomes more difficult when harmful messages appear in popular formats like memes. Memes are widely used by younger users to communicate online. Because they combine images and text, detecting offensive meaning becomes challenging. In this work, we attempt to address this problem. We develop a method to identify such content using the meme dataset released for the LT-EDI 2026 challenge and secured rank 5 in the shared task. We propose a Zero-shot learning based method employing two LLMs (Qwen2.5-VL-3B-Instruct and Meta-Llama-3-8B-Instruct) to generate descriptions and classify such memes. We achieved a macro F1-score of 0.55 for the English language meme.
The rapid growth of social media has also led to a rise in abusive and harmful content, which negatively affects the online environment for users. The frequent use of offensive language and hate speech contributes to making these platforms increasingly hostile. In particular, homophobic and transphobic remarks target members of the LGBT+ community. Detecting such comments is therefore essential so that they can be flagged promptly and appropriate warnings can be given to users involved in such behaviour. The problem becomes more serious when such content appears in other forms of communication used by younger generations, such as memes. This work tries to address this issue. We propose a method to detect such content using the meme dataset from the LT-EDI 2026 challenge and secured 8th rank for English and 6th rank for Chinese language dataset in the shared task. Our approach uses a multimodal technique that processes both image and text information. The dataset has limited data, which creates a challenge. To handle this, we pre–fine-tune the models on a similar dataset called PrideMM. The proposed multimodal approach achieved Macro F1-scores of 0.24 and 0.57 for English and Chinese memes respectively.