Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

Kareem Darwish, Ahmed Ali, Ibrahim Abu Farha, Samia Touileb, Imed Zitouni, Ahmed Abdelali, Sharefah Al-Ghamdi, Sakhar Alkhereyf, Wajdi Zaghouani, Salam Khalifa, Badr AlKhamissi, Rawan Almatham, Injy Hamed, Zaid Alyafeai, Areeb Alowisheq, Go Inoue, Khalil Mrini, Waad Alshammari (Editors)


Anthology ID:
2025.arabicnlp-sharedtasks
Month:
November
Year:
2025
Address:
Suzhou, China
Venue:
ArabicNLP
SIG:
Publisher:
Association for Computational Linguistics
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.arabicnlp-sharedtasks/
DOI:
ISBN:
979-8-89176-356-2
Bib Export formats:
BibTeX
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.arabicnlp-sharedtasks.pdf

pdf bib
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
Kareem Darwish | Ahmed Ali | Ibrahim Abu Farha | Samia Touileb | Imed Zitouni | Ahmed Abdelali | Sharefah Al-Ghamdi | Sakhar Alkhereyf | Wajdi Zaghouani | Salam Khalifa | Badr AlKhamissi | Rawan Almatham | Injy Hamed | Zaid Alyafeai | Areeb Alowisheq | Go Inoue | Khalil Mrini | Waad Alshammari

pdf bib
The AraGenEval Shared Task on Arabic Authorship Style Transfer and AI Generated Text Detection
Shadi Abudalfa | Saad Ezzini | Ahmed Abdelali | Hamza Alami | Abdessamad Benlahbib | Salmane Chafik | Mo El-Haj | Abdelkader El Mahdaouy | Mustafa Jarrar | Salima Lamsiyah | Hamzah Luqman

We present an overview of the AraGenEval shared task, organized as part of the ArabicNLP 2025 conference. This task introduced the first benchmark suite for Arabic authorship analysis, featuring three subtasks: Authorship Style Transfer, Authorship Identification, and AI-Generated Text Detection. We curated high-quality datasets, including over 47,000 paragraphs from 21 authors and a balanced corpus of human- and AI-generated texts. The task attracted significant global participation, with 72 registered teams from 16 countries. The results highlight the effectiveness of transformer-based models, with top systems leveraging prompt engineering for style transfer, model ensembling for authorship identification, and a mix of multilingual and Arabic-specific models for AI text detection. This paper details the task design, datasets, participant systems, and key findings, establishing a foundation for future research in Arabic stylistics and trustworthy NLP.

pdf bib
MISSION at AraGenEval Shared Task: Enhanced Arabic Authority Classification
Thamer Maseer Alharbi

pdf bib
Nojoom.AI at AraGenEval Shared Task: Arabic Authorship Style Transfer
Hafsa Kara Achira | Mourad Bouache | Mourad Dahmane

pdf bib
LMSA at AraGenEval Shared Task: Ensemble-Based Detection of AI-Generated Arabic Text Using Multilingual and Arabic-Specific Models
Kaoutar Zita | Attia Nehar | Abdelkader Khelil | Slimane Bellaouar | Hadda Cherroun

pdf bib
Amr&MohamedSabaa at AraGenEval shared task: Arabic Authorship Identification using Term Frequency – Inverse Document Frequency Features with Supervised Machine Learning
Amr Sabaa | Mohamed Sabaa

pdf bib
NLP_wizard at AraGenEval shared task: Embedding-Based Classification for AI Detection and Authorship Attribution
Mena Hany

pdf bib
PTUK-HULAT at AraGenEval Shared Task: Fine-tuning XLM-RoBERTa for AI-Generated Arabic News Detection
Tasneem Duridi | Areej Jaber | Paloma Martínez

pdf bib
ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer
Omer Nacar | Mahmoud Reda | Serry Sibaee | Yasser Alhabashi | Adel Ammar | Wadii Boulila

pdf bib
Athership at AraGenEval Shared Task: Identifying Arabic Authorship with a Dual-Model Logit Fusion
Mohamed Amin | Mahmoud Rady | Mariam Hossam | Sara Gaballa | Eman Samir | Maria Bassem | Nisreen Hisham | Ayman Khalafallah

pdf bib
Sebaweh at AraGenEval Shared Task: BERENSE - BERt based ENSEmbler for Arabic Authorship Identification
Muhammad Helmy | Batool Najeh Balah | Ahmed Mohamed Sallam | Ammar Sherif

pdf bib
CUET-NLP_Team_SS306 at AraGenEval Shared Task: A Transformer-based Framework for Detecting AI-Generated Arabic Text
Sowrav Nath | Shadman Saleh | Kawsar Ahmed | Mohammed Moshiul Hoque

pdf bib
BUSTED at ARATECT Shared Task: A Comparative Study of Transformer-Based Models for Arabic AI-Generated Text Detection
Ali Zain | Sareem Farooqui | Muhammad Rafi

pdf bib
CIOL at AraGenEval shared task: Authorship Identification and AI Generated Text Detection in Arabic using Pretrained Models
Sadia Tasnim Meem | Azmine Toushik Wasi

pdf bib
Osint at AraGenEval shared task: Fine-Tuned Modeling for Tracking Style Signatures and AI Generation in Arabic Texts
Shifali Agrahari | Hemanth Prakash Simhadri | Ashutosh Kumar Verma | Ranbir Singh Sanasam

pdf bib
MarsadLab at AraGenEval Shared Task: LLM-Based Approaches to Arabic Authorship Style Transfer and Identification
Md. Rafiul Biswas | Mabrouka Bessghaier | Firoj Alam | Wajdi Zaghouani

pdf bib
REGLAT at AraGenEval shared task: Morphology-Aware AraBERT for Detecting Arabic AI-Generated Text
Mariam Labib | Nsrin Ashraf | Mohammed Aldawsari | Hamada Nayel

pdf bib
Jenin at AraGenEval Shared Task: Parameter-Efficient Fine-Tuning and Layer-Wise Analysis of Arabic LLMs for Authorship Style Transfer and Classification
Huthayfa Malhis | Mohammad Tami | Huthaifa I. Ashqar

pdf bib
AraHealthQA 2025: The First Shared Task on Arabic Health Question Answering
Hassan Alhuzali | Farah E. Shamout | Muhammad Abdul-Mageed | Chaimae Abouzahir | Mouath Abu Daoud | Ashwag Alasmari | Walid Al-Eisawi | Renad Al-Monef | Ali Alqahtani | Lama Ayash | Nizar Habash | Leen Kharouf

We introduce AraHealthQA 2025, the Comprehensive Arabic Health Question Answering Shared Task, held in conjunction with ArabicNLP 2025 co-located with EMNLP 2025. This shared task addresses the paucity of high-quality Arabic medical QA resources by offering two complementary tracks: MentalQA, focusing on Arabic mental health Q&A (e.g., anxiety, depression, stigma reduction), and MedArabiQ, covering broader medical domains such as internal medicine, pediatrics, and clinical decision making. Each track comprises multiple subtasks, evaluation datasets, and standardized metrics, facilitating fair benchmarking. The task was structured to promote modeling under realistic, multilingual, and culturally nuanced healthcare contexts. We outline the dataset creation, task design and evaluation framework, participation statistics, baseline systems, and summarize the overall outcomes. We conclude with reflections on the performance trends observed and prospects for future iterations in Arabic health QA.

pdf bib
NYUAD at AraHealthQA Shared Task: Benchmarking the Medical Understanding and Reasoning of Large Language Models in Arabic Healthcare Tasks
Nouar AlDahoul | Yasir Zaki

pdf bib
MedLingua at MedArabiQ2025: Zero- and Few-Shot Prompting of Large Language Models for Arabic Medical QA
Fatimah Mohamed Emad Elden | Mumina Ab. Abukar

pdf bib
Sakinah-AI at MentalQA: A Comparative Study of Few-Shot, Optimized, and Ensemble Methods for Arabic Mental Health Question Classification
Fatimah Mohamed Emad Elden | Mumina Ab. Abukar

pdf bib
MindLLM at AraHealthQA 2025 Track 1: Leveraging Large Language Models for Mental Health Question Answering
Nejood Abdulaziz Bin Eshaq

pdf bib
Quasar at AraHealthQA Track 1 : Leveraging Zero-Shot Large Language Models for Question and Answer Categorization in Arabic Mental Health
Adiba Fairooz Chowdhury | Md Sagor Chowdhury

pdf bib
Binary_Bunch at AraHealthQA Track 1: Arabic Mental Health Q&A Classification Using Data Augmentation and Transformer Models
Sajib Bhattacharjee | Ratnajit Dhar | Kawsar Ahmed | Mohammed Moshiul Hoque

pdf bib
!MSA at AraHealthQA 2025 Shared Task: Enhancing LLM Performance for Arabic Clinical Question Answering through Prompt Engineering and Ensemble Learning
Mohamed Younes | Seif Ahmed | Mohamed Basem

pdf bib
Sindbad at AraHealthQA Track 1: Leveraging Large Language Models for Mental Health Q&A
AbdulRahman A. Morsy | Saad Mankarious | Ayah Zirikly

pdf bib
Arabic Mental Health Question Answering: A Multi-Task Approach with Advanced Retrieval-Augmented Generation
Abdelaziz Amr AbdelAziz | Mohamed Ahmed Youssef | Mamdouh Mohamed Koritam | Marwa Eldeeb | Ensaf Hussein

pdf bib
AraMinds at AraHealthQA 2025: A Retrieval-Augmented Generation System for Fine-Grained Classification and Answer Generation of Arabic Mental Health Q&A
Mohamed Zaytoon | Ahmed Mahmoud Salem | Ahmed Sakr | Hossam Elkordi

pdf bib
Fahmni at AraHealthQA Track 1: Multi-Agent Retrieval-Augmented Generation and Multi-Label Classification for Arabic Mental Health Q&A
Caroline Sabty | Mohamad Rasmy | Mohamed Eyad Badran | Nourhan Sakr | Alia El Bolock

pdf bib
MedGapGab at AraHealthQA: Modular LLM Assignment for Gaps and Gabs in Arabic Medical Question Answering
Baraa Hikal

pdf bib
Egyhealth at General Arabic Health QA (MedArabiQ): An Enhanced RAG Framework with Large-Scale Arabic Q&A Medical Data
Hossam Amer | Rawan Tarek Taha | Gannat Elsayed | Ensaf Hussein Mohamed

pdf bib
mucAI at AraHealthQA 2025: Explain–Retrieve–Verify (ERV) Workflow for Multi-Label Arabic Health QA Classification
Ahmed Abdou

pdf bib
MarsadLab at AraHealthQA: Hybrid Contextual–Lexical Fusion with AraBERT for Question and Answer Categorization
Mabrouka Bessghaier | Shimaa Ibrahim | Md. Rafiul Biswas | Wajdi Zaghouani

pdf bib
BAREC Shared Task 2025 on Arabic Readability Assessment
Khalid N. Elmadani | Bashar Alhafni | Hanada Taha | Nizar Habash

We present the results and findings of the BAREC Shared Task 2025 on Arabic Readability Assessment, organized as part of The Third Arabic Natural Language Processing Conference (ArabicNLP 2025). The BAREC 2025 shared task focuses on automatic readability assessment using BAREC Corpus, addressing fine-grained classification into 19 readability levels. The shared task includes two sub-tasks: sentence-level classification and document-level classification, and three tracks: (1) Strict Track, where only BAREC Corpus is allowed; (2) Constrained Track, restricted to the BAREC Corpus, SAMER Corpus, and SAMER Lexicon, and (3) Open Track, allowing any external resources. A total of 22 teams from 12 countries registered for the task. Among these, 17 teams submitted system description papers. The winning team achieved 87.5 QWK on the sentence-level task and 87.4 QWK on the document-level task.

pdf bib
Syntaxa at BAREC Shared Task 2025: BERTnParse - Fusion of BERT and Dependency Graphs for Readability Prediction
Ahmed Bahloul

pdf bib
GNNinjas at BAREC Shared Task 2025: Lexicon-Enriched Graph Modeling for Arabic Document Readability Prediction
Passant Elchafei | Mayar Osama | Mohamad Rageh | Mervat Abu-Elkheir

pdf bib
ZAI at BAREC Shared Task 2025: AraBERT CORAL for Fine Grained Arabic Readability
Ahmad M. Nazzal

pdf bib
ANLPers at BAREC Shared Task 2025: Readability of Embeddings Training Neural Readability Classifiers on the BAREC Corpus
Serry Sibaee | Omer Nacar | Yasser Alhabashi | Adel Ammar | Wadii Boulila

pdf bib
MarsadLab at BAREC Shared Task 2025: Strict-Track Readability Prediction with Specialized AraBERT Models on BAREC
Shimaa Ibrahim | Md. Rafiul Biswas | Mabrouka Bessghaier | Wajdi Zaghouani

pdf bib
SATLab at BAREC Shared Task 2025: Optimizing a Language-Independent System for Fine-Grained Readability Assessment
Yves Bestgen

pdf bib
MorphoArabia at BAREC Shared Task 2025: A Hybrid Architecture with Morphological Analysis for Arabic Readability Assessment
Fatimah Mohamed Emad Elden

pdf bib
!MSA at BAREC Shared Task 2025: Ensembling Arabic Transformers for Readability Assessment
Mohamed Basem | Mohamed Younes | Seif Ahmed | Abdelrahman Moustafa

pdf bib
Qais at BAREC Shared Task 2025: A Fine-Grained Approach for Arabic Readability Classification Using a pre-trained model
Samar Ahmad

pdf bib
mucAI at BAREC Shared Task 2025: Towards Uncertainty Aware Arabic Readability Assessment
Ahmed Abdou

pdf bib
AMAR at BAREC Shared Task 2025: Arabic Meta-learner for Assessing Readability
Mostafa Saeed | Rana Waly | Abdelaziz Ashraf Hussein

pdf bib
Noor at BAREC Shared Task 2025: A Hybrid Transformer-Feature Architecture for Sentence-level Readability Assessment
Nour Rabih

pdf bib
PalNLP at BAREC Shared Task 2025: Predicting Arabic Readability Using Ordinal Regression and K-Fold Ensemble Learning
Mutaz Ayesh

pdf bib
Pixels at BAREC Shared Task 2025: Visual Arabic Readability Assessment
Ben Sapirstein

pdf bib
Phantoms at BAREC Shared Task 2025: Enhancing Arabic Readability Prediction with Hybrid BERT and Linguistic Features
Ahmed Alhassan | Asim Mohamed | Moayad Elamin

pdf bib
STBW at BAREC Shared Task 2025: AraBERT-v2 with MSE-SoftQWK Loss for Sentence-Level Arabic Readability
Saoussan Trigui

pdf bib
LIS at BAREC Shared Task 2025: Multi-Scale Curriculum Learning for Arabic Sentence-Level Readability Assessment Using Pre-trained Language Models
Anya Amel Nait Djoudi | Patrice Bellot | Adrian-Gabriel Chifu

pdf bib
ImageEval 2025: The First Arabic Image Captioning Shared Task
Ahlam Bashiti | Alaa Aljabari | Hadi Khaled Hamoud | Md. Rafiul Biswas | Bilal Mohammed Shalash | Mustafa Jarrar | Fadi Zaraket | George Mikros | Ehsaneddin Asgari | Wajdi Zaghouani

We present ImageEval 2025, the first shared task dedicated to Arabic image captioning. The task addresses the critical gap in multimodal Arabic NLP by focusing on two complementary subtasks: (1) creating the first open-source, manually-captioned Arabic image dataset through a collaborative datathon, and (2) developing and evaluating Arabic image captioning models. A total of 44 teams registered, of which eight submitted during the test phase, producing 111 valid submissions. Evaluation was conducted using automatic metrics, LLM-based judgment, and human assessment. In Subtask 1, the best-performing system achieved a cosine similarity of 65.5, while in Subtask 2, the top score was 60.0. Although these results show encouraging progress, they also confirm that Arabic image captioning remains a challenging task, particularly due to cultural grounding requirements, morphological richness, and dialectal variation. All datasets, baseline models, and evaluation tools are released publicly to support future research in Arabic multimodal NLP.

pdf bib
Codezone Research Group at ImageEval Shared-Task 2: Arabic Image Captioning Using BLIP and M2M100: A Two-Stage Translation Approach for ImageEval 2025
Abdulkadir Shehu Bichi

pdf bib
BZU-AUM@ImageEval2025: An Arabic Image Captioning Dataset for Conflict Narratives with Human Annotation
Mohammed Alkhanafseh | Ola Surakhi | Abdallah Abedaljalill

pdf bib
ImpactAi at ImageEval 2025 Shared Task: Region-Aware Transformers for Arabic Image Captioning – A Case Study on the Palestinian Narrative
Rabee Al-Qasem | Mohannad Hendi

pdf bib
VLCAP at ImageEval 2025 Shared Task: Multimodal Arabic Captioning with Interpretable Visual Concept Integration
Passant Elchafei | Amany Fashwan

pdf bib
PhantomTroupe at ImageEval 2025 Shared Task: Multimodal Arabic Image Captioning through Translation-Based Fine-Tuning of LLM Models
Muhammad Abu Horaira | Farhan Amin | Sakibul Hasan | Md. Tanvir Ahammed Shawon | Muhammad Ibrahim Khan

pdf bib
NU_Internship team at ImageEval 2025: From Zero-Shot to Ensembles: Enhancing Grounded Arabic Image Captioning
Rana Gaber | Seif Eldin Amgad | Ahmed Sherif Nasri | Mohamed Ibrahim Ragab | Ensaf Hussein Mohamed

pdf bib
Averroes at ImageEval 2025 Shared Task: Advancing Arabic Image Captioning with Augmentation and Two-Stage Generation
Mariam Saeed | Sarah Elshabrawy | Abdelrahman Hagrass | Mazen Yasser | Ayman Khalafallah

pdf bib
AZLU at ImagEval Shared Task: Bridging Linguistics and Cultural Gaps in Arabic Image Captioning
Sarah Yassine

pdf bib
Iqra’Eval: A Shared Task on Qur’anic Pronunciation Assessment
Yassine El Kheir | Amit Meghanani | Hawau Olamide Toyin | Nada Almarwani | Omnia Ibrahim | Yousseif Ahmed Elshahawy | Mostafa Shahin | Ahmed Ali

We present the findings of the first shared task on Qur’anic pronunciation assessment, which focuses on addressing the unique challenges of evaluating the precise pronunciation of Qur’anic recitation. To fill an existing research gap, the Iqra’Eval 2025 shared task introduces the first open benchmark for Mispronunciation Detection and Diagnosis (MDD) in Qur’anic recitation, using Modern Standard Arabic (MSA) reading of Qur’anic texts as its case study. The task provides a comprehensive evaluation framework with increasingly complex subtasks: error localization and detailed error diagnosis. Leveraging the recently developed QuranMB benchmark dataset along with auxiliary training resources, this shared task aims to stimulate research in an area of both linguistic and cultural significance while addressing computational challenges in pronunciation assessment.

pdf bib
Hafs2Vec: A System for the IqraEval Arabic and Qur’anic Phoneme-level Pronunciation Assessment
Ahmed Ibrahim

pdf bib
Phoneme-level mispronunciation detection in Quranic recitation using ShallowTransformer
Mohamed Nadhir Daoud | Mohamed Anouar Ben Messaoud

pdf bib
ANPLers at IqraEval Shared task: Adapting Whisper-large-v3 as Speech-to-Phoneme for Qur’anic Recitation Mispronunciation Detection
Nour Qandos | Serry Sibaee | Samar Ahmad | Omer Nacar | Adel Ammar | Wadii Boulila | Yasser Alhabashi

pdf bib
AraS2P: Arabic Speech-to-Phonemes System
Bassam Mattar | Mohamed Fayed | Ayman Khalafallah

pdf bib
Metapseud at Iqra’Eval: Domain Adaptation with Multi-Stage Fine-Tuning for Phoneme-Level Qur’anic Mispronunciation Detection
Ayman Mansour

pdf bib
IslamicEval 2025: The First Shared Task of Capturing LLMs Hallucination in Islamic Content
Hamdy Mubarak | Rana Malhas | Watheq Mansour | Abubakr Mohamed | Mahmoud Fawzi | Majd Hawasly | Tamer Elsayed | Kareem Mohamed Darwish | Walid Magdy

Hallucination in Large Language Models (LLMs) remains a significant challenge and continues to draw substantial research attention. The problem becomes especially critical when hallucinations arise in sensitive domains, such as religious discourse. To address this gap, we introduce IslamicEval 2025—the first shared task specifically focused on evaluating and detecting hallucinations in Islamic content. The task consists of two subtasks: (1) Hallucination Detection and Correction of quoted verses (Ayahs) from the Holy Quran and quoted Hadiths; and (2) Qur’an and Hadith Question Answering, which assesses retrieval models and LLMs by requiring answers to be retrieved from grounded, authoritative sources. Thirteen teams participated in the final phase of the shared task, employing a range of pipelines and frameworks. Their diverse approaches underscore both the complexity of the task and the importance of effectively managing hallucinations in Islamic discourse.

pdf bib
NUR at IslamicEval 2025 Shared Task: Retrieval-Augmented LLMs for Qur’an and Hadith QA
Serag Amin | Ranwa Aly | Yara Allam | Yomna Eid | Ensaf Hussein Mohamed

pdf bib
BurhanAI at IslamicEval 2025 Shared Task: Combating Hallucinations in LLMs for Islamic Content; Evaluation, Correction, and Retrieval-Based Solution
Arij Al Adel | Abu Bakr Soliman | Mohamed Sakher Sawan | Rahaf Al-Najjar | Sameh Amin

pdf bib
HUMAIN at IslamicEval 2025 Shared Task 1: A Three-Stage LLM-Based Pipeline for Detecting and Correcting Hallucinations in Quran and Hadith
Arwa Omayrah | Sakhar Alkhereyf | Ahmed Abdelali | Abdulmohsen Al-Thubaity | Jeril Kuriakose | Ibrahim AbdulMajeed

pdf bib
TCE at IslamicEval 2025: Retrieval-Augmented LLMs for Quranic and Hadith Content Identification and Verification
Mohammed ElKoumy | Khalid Allam | Ahmad Tamer | Mohamed Alqablawi

pdf bib
ThinkDrill at IslamicEval 2025 Shared Task: LLM Hybrid Approach for Qur’an and Hadith Question Answering
Eman Elrefai | Toka Khaled | Ahmed Soliman

pdf bib
Burhan at IslamicEval: Fact-Augmented and LLM-Driven Retrieval for Islamic QA
Mohammad Basheer | Watheq Mansour | Abdulhamid Touma | Ahmad Qadeib Alban

pdf bib
Isnad AI at IslamicEval 2025: A Rule-Based System for Identifying Religious Texts in LLM Outputs
Fatimah Mohamed Emad Elden

pdf bib
MAHED Shared Task: Multimodal Detection of Hope and Hate Emotions in Arabic Content
Wajdi Zaghouani | Md. Rafiul Biswas | Mabrouka Bessghaier | Shimaa Ibrahim | George Mikros | Abul Hasnat | Firoj Alam

This paper presents the MAHED 2025 Shared Task on Multimodal Detection of Hope and Hate Emotions in Arabic Content, comprising three subtasks: (1) text-based classification of Arabic content into hate and hope, (2) multi-task learning for joint prediction of emotions, offensive content, and hate speech, and (3) multimodal detection of hateful content in Arabic memes. We provide three high-quality datasets totaling over 22,000 instances sourced from social media platforms, annotated by native Arabic speakers with Cohen’s Kappa exceeding 0.85. Our evaluation attracted 46 leaderboard submissions from participants, with systems leveraging Arabic-specific pre-trained language models (AraBERT, MARBERT), large language models (GPT-4, Gemini), and multimodal fusion architectures combining CLIP vision encoders with Arabic text models. The best-performing systems achieved macro F1-scores of 0.723 (Task 1), 0.578 (Task 2), and 0.796 (Task 3), with top teams employing ensemble methods, class-weighted training, and OCR-aware multimodal fusion. Analysis reveals persistent challenges in dialectal robustness, minority class detection for hope speech, and highlights key directions for future Arabic content moderation research.

pdf bib
NYUAD at MAHED Shared Task: Detecting Hope, Hate, and Emotion in Arabic Textual Speech and Multi-modal Memes Using Large Language Models
Nouar AlDahoul | Yasir Zaki

pdf bib
NguyenTriet at MAHED Shared Task: Ensemble of Arabic BERT Models with Hierarchical Prediction and Soft Voting for Text-Based Hope and Hate Detection
Nguyen Minh Triet | Thìn Đặng Văn

pdf bib
ANLPers at MAHED2025: From Hate to Hope: Boosting Arabic Text Classification
Yasser Alhabashi | Serry Sibaee | Omer Nacar | Adel Ammar | Wadii Boulila

pdf bib
LoveHeaven at MAHED 2025: Text-based Hate and Hope Speech Classification Using AraBERT-Twitter Ensemble
Nguyễn Thiên Bảo | Dang Van Thin

pdf bib
CIC-NLP at MAHED 2025 TASK 1:Assessing the Role of Bigram Augmentation in Multiclass Arabic Hate and Hope Speech Classification
Tolulope Olalekan Abiola | Oluwatobi Joseph Abiola | Ogunleye Temitope Dasola | Tewodros Achamaleh | Obiadoh Augustine Ekenedilichukwu

pdf bib
TranTranUIT at MAHED Shared Task: Multilingual Transformer Ensemble with Advanced Data Augmentation and Optuna-based Hyperparameter Optimization
Trinh Tran Tran | Thìn Đặng Văn

pdf bib
YassirEA at MAHED 2025: Fusion-Based Multimodal Models for Arabic Hate Meme Detection
Yassir El Attar

pdf bib
AAA at MAHED Text-based Hate and Hope Speech Classification: A Systematic Encoder Evaluation for Arabic Hope and Hate Speech Classification
Ahmed Khalil Elzainy | Mohamed Amin | Ahmed Samir | Hazem Abdelsalam

pdf bib
CUET-823 at MAHED 2025 Shared Task: Large Language Model-Based Framework for Emotion, Offensive, and Hate Detection in Arabic
Ratnajit Dhar | Arpita Mallik

pdf bib
AraMinds at MAHED 2025: Leveraging Vision-Language Models and Contrastive Multi-task Learning for Multimodal Hate Speech Detection
Mohamed Zaytoon | Ahmed Mahmoud Salem | Ahmed Sakr | Hossam Elkordi

pdf bib
DesCartes-HOPE at MAHED Shared task 2025: Integrating Pragmatic Features for Arabic Hope and Hate Speech Detection
Leila Moudjari | Hacène-Cherkaski Mélissa | Farah Benamara

pdf bib
ANLP-UniSo at MAHED Shared Task: Detection of Hate and Hope Speech in Arabic Social Media based on XLM-RoBERTa and Logistic Regression
Yasmine El Abed | Mariem Ben Arbia | Saoussen Ben Chaabene | Omar Trigui

pdf bib
REGLAT at MAHED Shared Task: A Hybrid Ensemble-Based System for Arabic Hate Speech Detection
Nsrin Ashraf | Mariam Labib | Tarek Elshishtawy | Hamada Nayel

pdf bib
HTU at MAHED Shared Task: Ensemble-Based Classification of Arabic Hate and Hope Speech Using Pre-trained Dialectal Arabic Models
Abdallah Saleh | Mariam M Biltawi

pdf bib
SmolLab_SEU at MAHED Shared Task: Do Arabic-Native Encoders Surpass Multilingual Models in Detecting the Nuances of Hope, Hate, and Emotion?
Md Abdur Rahman | Md Sabbir Dewan | Md. Tofael Ahmed Bhuiyan | Md Ashiqur Rahman

pdf bib
Baoflowin502 at MAHED Shared Task: Text-based Hate and Hope Speech Classification
Nguyen Minh Bao | Dang Van Thin

pdf bib
AyahVerse at MAHED Shared Task: Fine-Tuning ArabicBERT with Preprocessing for Hope and Hate Detection
Ibad-ur-Rehman Rashid | Muhammad Hashir Khalil

pdf bib
MultiMinds at MAHED 2025: Multimodal and Multitask Approaches for Detecting Emotional, Hate, and Offensive Speech in Arabic Content
Riddhiman Debnath | Abdul Wadud Shakib | Md Saiful Islam

pdf bib
joy_2004114 at MAHED Shared Task : Filtering Hate Speech from Memes using A Multimodal Fusion-based Approach
Joy Das | Alamgir Hossain | Mohammed Moshiul Hoque

pdf bib
Quasar at MAHED Shared Task : Decoding Emotions and Offense in Arabic Text using LLM and Transformer-Based Approaches
Md Sagor Chowdhury | Adiba Fairooz Chowdhury

pdf bib
CUET_Zahra_Duo@Mahed 2025: Hate and Hope Speech Detection in Arabic Social Media Content using Transformer
Walisa Alam | Mehreen Rahman | Shawly Ahsan | Mohammed Moshiul Hoque

pdf bib
AraNLP at MAHED 2025 Shared Task: Using AraBERT for Text-based Hate and Hope Speech Classification
Wafaa S. El-Kassas | Enas A. Hakim Khalil

pdf bib
Thinking Nodes at MAHED: A Comparative Study of Multimodal Architectures for Arabic Hateful Meme Detection
Itbaan Safwan

pdf bib
NADI 2025: The First Multidialectal Arabic Speech Processing Shared Task
Bashar Talafha | Hawau Olamide Toyin | Peter Sullivan | AbdelRahim A. Elmadany | Abdurrahman Juma | Amirbek Djanibekov | Chiyu Zhang | Hamad Alshehhi | Hanan Aldarmaki | Mustafa Jarrar | Nizar Habash | Muhammad Abdul-Mageed

We present the findings of the sixth Nuanced Arabic Dialect Identification (NADI 2025) Shared Task, which focused on Arabic speech dialect processing across three subtasks: spoken dialect identification (Subtask 1), speech recognition (Subtask 2), and diacritic restoration for spoken dialects (Subtask 3). A total of 44 teams registered, and during the testing phase, 100 valid submissions were received from eight unique teams. The distribution was as follows: 34 submissions for Subtask 1 five teams, 47 submissions for Subtask 2 six teams, and 19 submissions for Subtask 3 two teams. The best-performing systems achieved 79.8% accuracy on Subtask 1, 35.68/12.20 WER/CER (overall average) on Subtask 2, and 55/13 WER/CER on Subtask 3. These results highlight the ongoing challenges of Arabic dialect speech processing, particularly in dialect identification, recognition, and diacritic restoration. We also summarize the methods adopted by participating teams and briefly outline directions for future editions of NADI.

pdf bib
Munsit at NADI 2025 Shared Task 2: Pushing the Boundaries of Multidialectal Arabic ASR with Weakly Supervised Pretraining and Continual Supervised Fine-tuning
Mahmoud Salhab | Shameed Sait | Mohammad Abusheikh | Hasan Abusheikh

pdf bib
Lahjati at NADI 2025 A ECAPA-WavLM Fusion with Multi-Stage Optimization
Sanad Albawwab | Omar Qawasmeh

pdf bib
Saarland-Groningen at NADI 2025 Shared Task: Effective Dialectal Arabic Speech Processing under Data Constraints
Badr M. Abdullah | Yusser Al Ghussin | Zena Al-Khalili | Ömer Tarik Özyilmaz | Matias Valdenegro-Toro | Simon Ostermann | Dietrich Klakow

pdf bib
MarsadLab at NADI Shared Task: Arabic Dialect Identification and Speech Recognition using ECAPA-TDNN and Whisper
Md. Rafiul Biswas | Kais Attia | Shimaa Ibrahim | Mabrouka Bessghaier | Wajdi Zaghouani

pdf bib
Abjad AI at NADI 2025: CATT-Whisper: Multimodal Diacritic Restoration Using Text and Speech Representations
Ahmad Ghannam | Naif Alharthi | Faris Alasmary | Kholood Al Tabash | Shouq Sadah | Lahouari Ghouti

pdf bib
ELYADATA & LIA at NADI 2025: ASR and ADI Subtasks
Haroun Elleuch | Youssef Saidi | Salima Mdhaffar | Yannick Estève | Fethi Bougares

pdf bib
Unicorn at NADI 2025 Subtask 3: GEMM3N-DR: Audio-Text Diacritic Restoration via Fine-tuning Multimodal Arabic LLM
Mohamed Lotfy Elrefai

pdf bib
PalmX 2025: The First Shared Task on Benchmarking LLMs on Arabic and Islamic Culture
Fakhraddin Alwajih | Abdellah El Mekki | Hamdy Mubarak | Majd Hawasly | Abubakr Mohamed | Muhammad Abdul-Mageed

Large Language Models (LLMs) inherently reflect the vast data distributions they encounter during their pre-training phase. As this data is predominantly sourced from the web, there is a high chance it will be skewed towards high-resourced languages and cultures, such as those of the West. Consequently, LLMs often exhibit a diminished understanding of certain communities, a gap that is particularly evident in their knowledge of Arabic and Islamic cultures. This issue becomes even more pronounced with increasingly under-represented topics. To address this critical challenge, we introduce PalmX 2025, the first shared task designed to benchmark the cultural competence of LLMs in these specific domains. The task is composed of two subtasks featuring multiple-choice questions (MCQs) in Modern Standard Arabic (MSA): General Arabic Culture and General Islamic Culture. These subtasks cover a wide range of topics, including traditions, food, history, religious practices, and language expressions from across 22 Arab countries. The initiative drew considerable interest, with 26 teams registering for Subtask 1 and 19 for Subtask 2, culminating in nine and six valid submissions, respectively. Our findings reveal that task-specific fine-tuning substantially boosts performance over baseline models. The top-performing systems achieved an accuracy of 72.15% on cultural questions and 84.22% on Islamic knowledge. Parameter-efficient fine-tuning emerged as the predominant and most effective approach among participants, while the utility of data augmentation was found to be domain-dependent. Ultimately, this benchmark provides a crucial, standardized framework to guide the development of more culturally grounded and competent Arabic LLMs. Results of the shared task demonstrate that general cultural and general religious knowledge remain challenging to LLMs, motivating us to continue to offer the shared task in the future.

pdf bib
Hamyaria at PalmX2025: Leveraging Large Language Models to Improve Arabic Multiple-Choice Questions in Cultural and Islamic Domains
Walid Al-Dhabyani | Hamzah A. Alsayadi

pdf bib
ISL-NLP at PalmX 2025: Retrieval-Augmented Fine-Tuning for Arabic Cultural Question Answering
Mohamed Gomaa | Noureldin Elmadany

pdf bib
ADAPTMTU HAI at PalmX 2025: Leveraging Full and Parameter‐Efficient LLM Fine‐Tuning for Arabic Cultural QA
Shehenaz Hossain | Haithem Afli

pdf bib
CultranAI at PalmX 2025: Data Augmentation for Cultural Knowledge Representation
Hunzalah Hassan Bhatti | Youssef Ahmed | Md Arid Hasan | Firoj Alam

pdf bib
MarsadLab at PalmX Shared Task: An LLM Benchmark for Arabic Culture and Islamic Civilization
Md. Rafiul Biswas | Shimaa Ibrahim | Kais Attia | Firoj Alam | Wajdi Zaghouani

pdf bib
Star at PalmX 2025: Arabic Cultural Understanding via Targeted Pretraining and Lightweight Fine-tuning
Eman Elrefai | Esraa Khaled | Alhassan Ehab

pdf bib
AYA at PalmX 2025: Modeling Cultural and Islamic Knowledge in LLMs
Jannatul Tajrin | Bir Ballav Roy | Firoj Alam

pdf bib
Cultura-Arabica: Probing and Enhancing Arabic Cultural Awareness in Large Language Models via LoRA
Pulkit Chatwal | Santosh Kumar Mishra

pdf bib
Phoenix at Palmx: Exploring Data Augmentation for Arabic Cultural Question Answering
Houdaifa Atou | Issam Ait Yahia | Ismail Berrada

pdf bib
QIAS 2025: Overview of the Shared Task on Islamic Inheritance Reasoning and Knowledge Assessment
Abdessalam Bouchekif | Samer Rashwani | Emad Soliman Ali Mohamed | Mutaz Alkhatib | Heba Sbahi | Shahd Gaben | Wajdi Zaghouani | Aiman Erbad | Mohammed Ghaly

This paper provides a comprehensive overview of the QIAS 2025 shared task, organized as part of the ArabicNLP 2025 conference and co-located with EMNLP 2025. The task was designed for the evaluation of large language models in the complex domains of religious and legal reasoning. It comprises two subtasks: (1) Islamic Inheritance Reasoning, requiring models to compute inheritance shares according to Islamic jurisprudence, and (2) Islamic Knowledge Assessment, which covers a range of traditional Islamic disciplines. Both subtasks were structured as multiple-choice question answering challenges, with questions stratified by varying difficulty levels. The shared task attracted significant interest, with 44 teams participating in the development phase, from which 18 teams advanced to the final test phase. Of these, 6 teams submitted entries for both subtasks, 8 for Task 1 only, and two for Task 3 only. Ultimately, 16 teams submitted system description papers. Herein, we detail the task’s motivation, dataset construction, evaluation protocol, and present a summary of the participating systems and their results.

pdf bib
NYUAD at QIAS Shared Task: Benchmarking the Legal Reasoning of LLMs in Arabic Islamic Inheritance Cases
Nouar AlDahoul | Yasir Zaki

pdf bib
SHA at the QIAS Shared Task: LLMs for Arabic Islamic Inheritance Reasoning
Shatha Altammami

pdf bib
ANLPers at QIAS: CoT for Islamic Inheritance
Serry Sibaee | Mahmoud Reda | Omer Nacar | Yasser Alhabashi | Adel Ammar | Wadii Boulila

pdf bib
N&N at QIAS 2025: Chain-of-Thought Ensembles with Retrieval-Augmented framework for Classical Arabic Islamic
Nourah Alangari | Nouf AlShenaifi

pdf bib
HIAST at QIAS 2025: Retrieval-Augmented LLMs with Top-Hit Web Evidence for Arabic Islamic Reasoning QA
Mohamed Motasim Hamed | Nada Ghneim | Riad Sonbol

pdf bib
QU-NLP at QIAS 2025 Shared Task: A Two-Phase LLM Fine-Tuning and Retrieval-Augmented Generation Approach for Islamic Inheritance Reasoning
Mohammad AL-Smadi

pdf bib
Transformer Tafsir at QIAS 2025 Shared Task: Hybrid Retrieval-Augmented Generation for Islamic Knowledge Question Answering
Muhammad Abu Ahmad | Mohamad Ballout | Raia Abu Ahmad | Elia Bruni

pdf bib
PuxAI at QIAS 2025: Multi-Agent Retrieval-Augmented Generation for Islamic Inheritance and Knowledge Reasoning
Nguyen Xuan Phuc | Thìn Đặng Văn

pdf bib
Athar at QIAS2025: LLM-based Question Answering Systems for Islamic Inheritance and Classical Islamic Knowledge
Yossra Noureldien | Hassan Suliman | Farah Attallah | Abdelrazig Mohamed | Sara Abdalla

pdf bib
ADAPTMTU HAI at QIAS2025: Dual-Expert LLM Fine-Tuning and Constrained Decoding for Arabic Islamic Inheritance Reasoning
Shehenaz Hossain | Haithem Afli

pdf bib
CVPD at QIAS 2025 Shared Task: An Efficient Encoder-Based Approach for Islamic Inheritance Reasoning
Salah Eddine Bekhouche | Abdellah Zakaria Sellam | Telli Hichem | Cosimo Distante | Abdenour Hadid

pdf bib
CIS-RG at QIAS 2025 Shared Task: Approaches for Enhancing Performance of LLM on Islamic Legal Reasoning and its Mathematical Calculations
Osama Farouk Zaki

pdf bib
SEA-Team at QIAS 2025: Enhancing LLMs for Question Answering in Islamic Texts
Sanaa Alowaidi

pdf bib
MorAI at QIAS 2025: Collaborative LLM via Voting and Retrieval-Augmented Generation for Solving Complex Inheritance Problems
Jihad R’baiti | Chouaib El Hachimi | Youssef Hmamouche | Amal Seghrouchni

pdf bib
Gumball at QIAS 2025: Arabic LLM Automated Reasoning in Islamic Inheritance
Eman Elrefai | Mohamed Lotfy Elrefai | Aml Hassan Esmail

pdf bib
Tokenizers United at QIAS-2025: RAG-Enhanced Question Answering for Islamic Studies by Integrating Semantic Retrieval with Generative Reasoning
Mayar Boghdady

pdf bib
TAQEEM 2025: Overview of The First Shared Task for Arabic Quality Evaluation of Essays in Multi-dimensions
May Bashendy | Salam Albatarni | Sohaila Eltanbouly | Walid Massoud | Houda Bouamor | Tamer Elsayed

Automated Essay Scoring (AES) has emerged as a significant research problem in natural language processing, offering valuable tools to support educators in assessing student writing. Motivated by the growing need for reliable Arabic AES systems, we organized the first shared Task for Arabic Quality Evaluation of Essays in Multi-dimensions (TAQEEM) held at the ArabicNLP 2025 conference. TAQEEM 2025 includes two subtasks: Task A on holistic scoring and Task B on trait-specific scoring. It introduces a new (and first of its kind) dataset of 1,265 Arabic essays, annotated with holistic and trait-specific scores, including relevance, organization, vocabulary, style, development, mechanics, and grammar. The main goal of TAQEEM is to address the scarcity of standardized benchmarks and high-quality resources in Arabic AES. TAQEEM 2025 attracted 11 registered teams for Task A and 10 for Task B, with a total of 5 teams, across both tasks, submitting system runs for evaluation. This paper presents an overview of the task, outlines the approaches employed, and discusses the results of the participating teams.

pdf bib
ARxHYOKA at TAQEEM2025: Comparative Approaches to Arabic Essay Trait Scoring
Mohamad Alnajjar | Ahmad Almoustafa | Tomohiro Nishiyama | Shoko Wakamiya | Eiji Aramaki | Takuya Matsuzaki

pdf bib
912 at TAQEEM 2025: A Distribution-aware Approach to Arabic Essay Scoring
Trong-Tai Dam Vu | Thìn Đặng Văn

pdf bib
Taibah at TAQEEM 2025: Leveraging GPT-4o for Arabic Essay Scoring
Nada Almarwani | Alaa Alharbi | Samah Aloufi

pdf bib
MarsadLab at TAQEEM 2025: Prompt-Aware Lexicon-Enhanced Transformer for Arabic Automated Essay Scoring
Mabrouka Bessghaier | Md. Rafiul Biswas | Amira Dhouib | Wajdi Zaghouani