Muhammad Ibrahim Khan

Also published as: Muhammad Ibrahim Khan

2025

pdf bib
PhantomTroupe at ImageEval 2025 Shared Task: Multimodal Arabic Image Captioning through Translation-Based Fine-Tuning of LLM Models
Muhammad Abu Horaira | Farhan Amin | Sakibul Hasan | Md. Tanvir Ahammed Shawon | Muhammad Ibrahim Khan
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

pdf bib abs
KCRL@DravidianLangTech 2025: Multi-Pooling Feature Fusion with XLM-RoBERTa for Malayalam Fake News Detection and Classification
Fariha Haq | Md. Tanvir Ahammed Shawon | Md Ayon Mia | Golam Sarwar Md. Mursalin | Muhammad Ibrahim Khan
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

The rapid spread of misinformation on social media platforms necessitates robust detection mechanisms, particularly for languages with limited computational resources. This paper presents our system for the DravidianLangTech 2025 shared task on Fake News Detection in Malayalam YouTube comments, addressing both binary and multiclass classification challenges. We propose a Multi-Pooling Feature Fusion (MPFF) architecture that leverages [CLS] + Mean + Max pooling strategy with transformer models. Our system demonstrates strong performance across both tasks, achieving a macro-averaged F1 score of 0.874, ranking 6th in binary classification, and 0.628, securing 1st position in multiclass classification. Experimental results show that our MPFF approach with XLM-RoBERTa significantly outperforms traditional machine learning and deep learning baselines, particularly excelling in the more challenging multiclass scenario. These findings highlight the effectiveness of our methodology in capturing nuanced linguistic features for fake news detection in Malayalam, contributing to the advancement of automated verification systems for Dravidian languages.

pdf bib abs
KCRL@DravidianLangTech 2025: Multi-View Feature Fusion with XLM-R for Tamil Political Sentiment Analysis
Md Ayon Mia | Fariha Haq | Md. Tanvir Ahammed Shawon | Golam Sarwar Md. Mursalin | Muhammad Ibrahim Khan
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Political discourse on social media platforms significantly influences public opinion, necessitating accurate sentiment analysis for understanding societal perspectives. This paper presents a system developed for the shared task of Political Multiclass Sentiment Analysis in Tamil tweets. The task aims to classify tweets into seven distinct sentiment categories: Substantiated, Sarcastic, Opinionated, Positive, Negative, Neutral, and None of the above. We propose a Multi-View Feature Fusion (MVFF) architecture that leverages XLM-R with a CLS-Attention-Mean mechanism for sentiment classification. Our experimental results demonstrate the effectiveness of our approach, achieving a macro-average F1-score of 0.37 on the test set and securing the 2nd position in the shared task. Through comprehensive error analysis, we identify specific classification challenges and demonstrate how our model effectively navigates the linguistic complexities of Tamil political discourse while maintaining robust classification performance across multiple sentiment categories.

pdf bib abs
BANMIME : Misogyny Detection with Metaphor Explanation on Bangla Memes
Md Ayon Mia | Akm Moshiur Rahman Mazumder | Khadiza Sultana Sayma | Md Fahim | Md Tahmid Hasan Fuad | Muhammad Ibrahim Khan | Akmmahbubur Rahman
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Detecting misogyny in multimodal content remains a notable challenge, particularly in culturally conservative and low-resource contexts like Bangladesh. While existing research has explored hate speech and general meme classification, the nuanced identification of misogyny in Bangla memes, rich in metaphor, humor, and visual-textual interplay, remains severely underexplored. To address this gap, we introduce BanMiMe, the first comprehensive Bangla misogynistic meme dataset comprising 2,000 culturally grounded samples where each meme includes misogyny labels, humor categories, metaphor localization, and detailed human-written explanations. We benchmark the various performance of open and closed-source vision-language models (VLMs) under zero-shot and prompt-based settings and evaluate their capacity for both classification and explanation generation. Furthermore, we systematically explore multiple fine-tuning strategies, including standard, data-augmented, and Chain-of-Thought (CoT) supervision. Our results demonstrate that CoT-based fine-tuning consistently enhances model performance, both in terms of accuracy and in generating meaningful explanations. We envision BanMiMe as a foundational resource for advancing explainable multimodal moderation systems in low-resource and culturally sensitive settings.

Co-authors

Farhan Amin 1

Md Fahim 1

Md Tahmid Hasan Fuad 1

Sakibul Hasan 1

Akm Moshiur Rahman Mazumder 1

Akmmahbubur Rahman 1

Khadiza Sultana Sayma 1

Venues

Fix author