Md. Rafiul Biswas

Also published as: Md. Rafiul Biswas

2025

pdf bib
MarsadLab at AraGenEval Shared Task: LLM-Based Approaches to Arabic Authorship Style Transfer and Identification
Md. Rafiul Biswas | Mabrouka Bessghaier | Firoj Alam | Wajdi Zaghouani
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

pdf bib
MarsadLab at AraHealthQA: Hybrid Contextual–Lexical Fusion with AraBERT for Question and Answer Categorization
Mabrouka Bessghaier | Shimaa Ibrahim | Md. Rafiul Biswas | Wajdi Zaghouani
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

pdf bib
MarsadLab at BAREC Shared Task 2025: Strict-Track Readability Prediction with Specialized AraBERT Models on BAREC
Shimaa Ibrahim | Md. Rafiul Biswas | Mabrouka Bessghaier | Wajdi Zaghouani
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

We present ImageEval 2025, the first shared task dedicated to Arabic image captioning. The task addresses the critical gap in multimodal Arabic NLP by focusing on two complementary subtasks: (1) creating the first open-source, manually-captioned Arabic image dataset through a collaborative datathon, and (2) developing and evaluating Arabic image captioning models. A total of 44 teams registered, of which eight submitted during the test phase, producing 111 valid submissions. Evaluation was conducted using automatic metrics, LLM-based judgment, and human assessment. In Subtask 1, the best-performing system achieved a cosine similarity of 65.5, while in Subtask 2, the top score was 60.0. Although these results show encouraging progress, they also confirm that Arabic image captioning remains a challenging task, particularly due to cultural grounding requirements, morphological richness, and dialectal variation. All datasets, baseline models, and evaluation tools are released publicly to support future research in Arabic multimodal NLP.

This paper presents the MAHED 2025 Shared Task on Multimodal Detection of Hope and Hate Emotions in Arabic Content, comprising three subtasks: (1) text-based classification of Arabic content into hate and hope, (2) multi-task learning for joint prediction of emotions, offensive content, and hate speech, and (3) multimodal detection of hateful content in Arabic memes. We provide three high-quality datasets totaling over 22,000 instances sourced from social media platforms, annotated by native Arabic speakers with Cohen’s Kappa exceeding 0.85. Our evaluation attracted 46 leaderboard submissions from participants, with systems leveraging Arabic-specific pre-trained language models (AraBERT, MARBERT), large language models (GPT-4, Gemini), and multimodal fusion architectures combining CLIP vision encoders with Arabic text models. The best-performing systems achieved macro F1-scores of 0.723 (Task 1), 0.578 (Task 2), and 0.796 (Task 3), with top teams employing ensemble methods, class-weighted training, and OCR-aware multimodal fusion. Analysis reveals persistent challenges in dialectal robustness, minority class detection for hope speech, and highlights key directions for future Arabic content moderation research.

pdf bib
MarsadLab at NADI Shared Task: Arabic Dialect Identification and Speech Recognition using ECAPA-TDNN and Whisper
Md. Rafiul Biswas | Kais Attia | Shimaa Ibrahim | Mabrouka Bessghaier | Wajdi Zaghouani
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

pdf bib
MarsadLab at PalmX Shared Task: An LLM Benchmark for Arabic Culture and Islamic Civilization
Md. Rafiul Biswas | Shimaa Ibrahim | Kais Attia | Firoj Alam | Wajdi Zaghouani
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

pdf bib
MarsadLab at TAQEEM 2025: Prompt-Aware Lexicon-Enhanced Transformer for Arabic Automated Essay Scoring
Mabrouka Bessghaier | Md. Rafiul Biswas | Amira Dhouib | Wajdi Zaghouani
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

2024

We present an overview of the second edition of the ArAIEval shared task, organized as part of the ArabicNLP 2024 conference co-located with ACL 2024. In this edition, ArAIEval offers two tasks: (i) detection of propagandistic textual spans with persuasion techniques identification in tweets and news articles, and (ii) distinguishing between propagandistic and non-propagandistic memes. A total of 14 teams participated in the final evaluation phase, with 6 and 9 teams participating in Tasks 1 and 2, respectively. Finally, 11 teams submitted system description papers. Across both tasks, we observed that fine-tuning transformer models such as AraBERT was at the core of the majority of the participating systems. We provide a description of the task setup, including a description of the dataset construction and the evaluation setup. We further provide a brief overview of the participating systems. All datasets and evaluation scripts are released to the research community. We hope this will enable further research on these important tasks in Arabic.

pdf bib abs
MemeMind at ArAIEval Shared Task: Generative Augmentation and Feature Fusion for Multimodal Propaganda Detection in Arabic Memes through Advanced Language and Vision Models
Uzair Shah | Md. Rafiul Biswas | Marco Agus | Mowafa Househ | Wajdi Zaghouani
Proceedings of the Second Arabic Natural Language Processing Conference

Detecting propaganda in multimodal content, such as memes, is crucial for combating disinformation on social media. This paper presents a novel approach for the ArAIEval 2024 shared Task 2 on Multimodal Propagandistic Memes Classification, involving text, image, and multimodal classification of Arabic memes. For text classification (Task 2A), we fine-tune state-of-the-art Arabic language models and use ChatGPT4-generated synthetic text for data augmentation. For image classification (Task 2B), we fine-tune ResNet18, EfficientFormerV2, and ConvNeXt-tiny architectures with DALL-E-2-generated synthetic images. For multimodal classification (Task 2C), we combine ConvNeXt-tiny and BERT architectures in a fusion layer to enhance binary classification. Our results show significant performance improvements with data augmentation for text and image classification models and with the fusion layer for multimodal classification. We highlight challenges and opportunities for future research in multimodal propaganda detection in Arabic content, emphasizing the need for robust and adaptable models to combat disinformation.

pdf bib abs
MemeMind at ArAIEval Shared Task: Spotting Persuasive Spans in Arabic Text with Persuasion Techniques Identification
Md. Rafiul Biswas | Zubair Shah | Wajdi Zaghouani
Proceedings of the Second Arabic Natural Language Processing Conference

This paper focuses on detecting propagandistic spans and persuasion techniques in Arabic text from tweets and news paragraphs. Each entry in the dataset contains a text sample and corresponding labels that indicate the start and end positions of propaganda techniques within the text. Tokens falling within a labeled span were assigned ’B’ (Begin) or ’I’ (Inside) tags, ’O’, corresponding to the specific propaganda technique. Using attention masks, we created uniform lengths for each span and assigned BIO tags to each token based on the provided labels. Then, we used AraBERT-base pre-trained model for Arabic text tokenization and embeddings with a token classification layer to identify propaganda techniques. Our training process involves a two-phase fine-tuning approach. First, we train only the classification layer for a few epochs, followed by full model fine-tuning, updating all parameters. This methodology allows the model to adapt to the specific characteristics of the propaganda detection task while leveraging the knowledge captured by the pretrained AraBERT model. Our approach achieved an F1 score of 0.2774, securing the 3rd position in the leaderboard of Task 1.

pdf bib abs
So Hateful! Building a Multi-Label Hate Speech Annotated Arabic Dataset
Wajdi Zaghouani | Hamdy Mubarak | Md. Rafiul Biswas
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Social media enables widespread propagation of hate speech targeting groups based on ethnicity, religion, or other characteristics. With manual content moderation being infeasible given the volume, automatic hate speech detection is essential. This paper analyzes 70,000 Arabic tweets, from which 15,965 tweets were selected and annotated, to identify hate speech patterns and train classification models. Annotators labeled the Arabic tweets for offensive content, hate speech, emotion intensity and type, effect on readers, humor, factuality, and spam. Key findings reveal 15% of tweets contain offensive language while 6% have hate speech, mostly targeted towards groups with common ideological or political affiliations. Annotations capture diverse emotions, and sarcasm is more prevalent than humor. Additionally, 10% of tweets provide verifiable factual claims, and 7% are deemed important. For hate speech detection, deep learning models like AraBERT outperform classical machine learning approaches. By providing insights into hate speech characteristics, this work enables improved content moderation and reduced exposure to online hate. The annotated dataset advances Arabic natural language processing research and resources.