George Mikros
2026
From RAG to Agentic RAG for Faithful Islamic Question Answering
Gagan Bhatia | Hamdy Mubarak | Mustafa Jarrar | George Mikros | Fadi Zaraket | Mahmoud Alhirthani | Mutaz al-Khatib | Logan Cochrane | Kareem Mohamed Darwish | Rashid Yahiaoui | Firoj Alam
Findings of the Association for Computational Linguistics: ACL 2026
Gagan Bhatia | Hamdy Mubarak | Mustafa Jarrar | George Mikros | Fadi Zaraket | Mahmoud Alhirthani | Mutaz al-Khatib | Logan Cochrane | Kareem Mohamed Darwish | Rashid Yahiaoui | Firoj Alam
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) are increasingly used for Islamic question answering, where ungrounded responses may carry serious religious consequences. Yet standard MCQ/MRC-style evaluations do not capture key real-world failure modes, notably free-form hallucinations and the ability to abstain when evidence is insufficient. To address this gap, we introduce IslamicFaithQA, a 3,810-item bilingual (Arabic/English) **generative** benchmark with atomic single-gold answers, which enables direct measurement of hallucination and abstention. We additionally developed an end-to-end grounded Islamic modeling suite consisting of *(i)* 25K Arabic text-grounded SFT reasoning pairs, *(ii)* 5K bilingual preference samples for reward-guided alignment, and *(iii)* a verse-level Qur’an retrieval corpus of ∼6k atomic *verses* (ayat). Building on these resources, we develop an agentic Quran-grounding framework (agentic RAG) that uses structured tool calls for iterative evidence seeking and answer revision. Experiments across Arabic-centric and multilingual LLMs show that retrieval improves correctness and that agentic RAG yields the largest gains beyond standard RAG, achieving state-of-the-art performance and stronger Arabic–English robustness even with a small model (i.e., Qwen3 4B). We made the datasets are publicly available (https://huggingface.co/datasets/QCRI/IslamicFaithQA).
2025
ImageEval 2025: The First Arabic Image Captioning Shared Task
Ahlam Bashiti | Alaa Aljabari | Hadi Hamoud | Md. Rafiul Biswas | Bilal Shalash | Mustafa Jarrar | Fadi Zaraket | George Mikros | Ehsaneddin Asgari | Wajdi Zaghouani
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
Ahlam Bashiti | Alaa Aljabari | Hadi Hamoud | Md. Rafiul Biswas | Bilal Shalash | Mustafa Jarrar | Fadi Zaraket | George Mikros | Ehsaneddin Asgari | Wajdi Zaghouani
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
GenAI Content Detection Task 2: AI vs. Human – Academic Essay Authenticity Challenge
Shammur Absar Chowdhury | Hind Almerekhi | Mucahid Kutlu | Kaan Efe Keleş | Fatema Ahmad | Tasnim Mohiuddin | George Mikros | Firoj Alam
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)
Shammur Absar Chowdhury | Hind Almerekhi | Mucahid Kutlu | Kaan Efe Keleş | Fatema Ahmad | Tasnim Mohiuddin | George Mikros | Firoj Alam
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)
This paper presents a comprehensive overview of the first edition of the Academic Essay Authenticity Challenge, organized as part of the GenAI Content Detection shared tasks collocated with COLING 2025. This challenge focuses on detecting machine-generated vs human-authored essays for academic purposes. The task is defined as follows: “Given an essay, identify whether it is generated by a machine or authored by a human.” The challenge involves two languages: English and Arabic. During the evaluation phase, 25 teams submitted systems for English and 21 teams for Arabic, reflecting substantial interest in the task. Finally, five teams submitted system description papers. The majority of submissions utilized fine-tuned transformer-based models, with one team employing Large Language Models (LLMs) such as Llama 2 and Llama 3. This paper outlines the task formulation, details the dataset construction process, and explains the evaluation framework. Additionally, we present a summary of the approaches adopted by participating teams. Nearly all submitted systems outperformed the n-gram-based baseline, with the top-performing systems achieving F1 scores exceeding 0.98 for both languages, indicating significant progress in the detection of machine-generated text.
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)
Firoj Alam | Preslav Nakov | Nizar Habash | Iryna Gurevych | Shammur Chowdhury | Artem Shelmanov | Yuxia Wang | Ekaterina Artemova | Mucahid Kutlu | George Mikros
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)
Firoj Alam | Preslav Nakov | Nizar Habash | Iryna Gurevych | Shammur Chowdhury | Artem Shelmanov | Yuxia Wang | Ekaterina Artemova | Mucahid Kutlu | George Mikros
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)
MAHED Shared Task: Multimodal Detection of Hope and Hate Emotions in Arabic Content
Wajdi Zaghouani | Md. Rafiul Biswas | Mabrouka Bessghaier | Shimaa Ibrahim | George Mikros | Abul Hasnat | Firoj Alam
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
Wajdi Zaghouani | Md. Rafiul Biswas | Mabrouka Bessghaier | Shimaa Ibrahim | George Mikros | Abul Hasnat | Firoj Alam
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
2024
Establishing Control Corpora for Depression Detection in Modern Greek: Methodological Insights
Vivian Stamou | George Mikros | George Markopoulos | Spyridoula Varlokosta
Proceedings of the Fifth Workshop on Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments @LREC-COLING 2024
Vivian Stamou | George Mikros | George Markopoulos | Spyridoula Varlokosta
Proceedings of the Fifth Workshop on Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments @LREC-COLING 2024
This paper presents a methodological approach for establishing control corpora in the context of depression detection in the Modern Greek language. We discuss various methods used to create control corpora, focusing on the challenge of selecting representative samples from the general population when the target reference is the depressed population. Our approach includes traditional random selection among Twitter users, as well as an innovative method for creating topic-oriented control corpora. Through this study, we provide insights into the development of control corpora, offering valuable considerations for researchers working on similar projects in linguistic analysis and mental health studies. In addition, we identify several dominant topics in the depressed population such as religion, sentiments, health and digestion, which seem to align with findings consistently reported in the literature
2002
Quantitative parameters in corpus design: Estimating the optimum text size in Modern Greek language
George Mikros
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)
George Mikros
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)
2000
Search
Fix author
Co-authors
- Firoj Alam 4
- Md. Rafiul Biswas 2
- Shammur Absar Chowdhury 2
- Mustafa Jarrar 2
- Mucahid Kutlu 2
- Wajdi Zaghouani 2
- Fadi A. Zaraket 2
- Fatema Ahmad 1
- Mahmoud Alhirthani 1
- Alaa Aljabari 1
- Hind Almerekhi 1
- Ekaterina Artemova 1
- Ehsaneddin Asgari 1
- Ahlam Bashiti 1
- Mabrouka Bessghaier 1
- Gagan Bhatia 1
- George Carayannis 1
- Logan Cochrane 1
- Kareem Mohamed Darwish 1
- Iryna Gurevych 1
- Nizar Habash 1
- Hadi Hamoud 1
- Abul Hasnat 1
- Shimaa Ibrahim 1
- Kaan Efe Keleş 1
- George Markopoulos 1
- Muhammad Tasnim Mohiuddin 1
- Hamdy Mubarak 1
- Preslav Nakov 1
- Bilal Shalash 1
- Artem Shelmanov 1
- Vivian Stamou 1
- Spyridoula Varlokosta 1
- Yuxia Wang 1
- Rashid Yahiaoui 1
- Mutaz al-Khatib 1