George Mikros

2026

Depression Detection in Modern Greek
Vivian Stamou | George Mikros | George Markopoulos | Spyridoula Varlokosta
Proceedings of the Sixth Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments in cooperation with the MENTAL.ai consortium

Despite advancements in NLP-based mental health screening, research remains predominantly English-centric, leaving under-resourced languages insufficiently explored. This study investigates depression detection in Modern Greek social media through a series of experiments. We benchmark traditional machine learning (ML) models against transformer architectures (GreekBERT, GreekSocialBERT, mBERT, and XLM-R) under two settings: a topic-oriented control corpus and a high-similarity stress-test contrasting a gold case of a depressed user with a matched control. Transformer models consistently outperform ML models (F1 = 0.95) but offer limited interpretability. To address this limitation, we incorporate LIWC-derived psycholinguistic features with SHAP explanations to examine model behavior in relation to established linguistic markers. The analysis reveals linguistic patterns consistent with depressive symptoms, such as reduced work-related engagement, social withdrawal, and the motivational deficits characteristically linked to anhedonia in clinical literature. Overall, the results provide a baseline for depression detection in Modern Greek and underscore the importance of grounding automated screening in clinically interpretable evidence.

bib abs

The NakbaArchiveClassifier Shared Task on Nakba Image Classification
Alexei Abrahams | Shadi Abudalfa | Mustafa Jarrar | George Mikros
Proceedings of the 2nd International Workshop on Nakba Narratives as Language Resources @ LREC 2026

The proliferation of social media platforms has significantly reshaped how conflicts are documented, generating large-scale visual records that must be structured to enable meaningful analysis. In this paper, we present the NakbaArchiveClassifier shared task, which focuses on binary classification of infrastructure damage in images from Gaza. This task formed part of the Nakba-NLP Workshop at LREC 2026 and is grounded in an ongoing initiative focused on humanitarian archiving. It utilizes a carefully curated dataset of 2,001 images sourced from Palestinian journalists and content creators on Instagram, spanning the period from October 7, 2023 to December 15, 2025. The objective for participants was to classify whether an image depicts damaged or destroyed infrastructure versus intact structures. This task poses multiple challenges, such as the complexity of real-world conflict imagery, imbalance between classes, and the inherent ambiguity present in many visual scenes. The NakbaArchiveClassifier shared task introduces a new benchmark for analyzing conflict-related visual data and provides valuable resources for advancing research in humanitarian AI, crisis analytics, and Arabic digital humanities.

bib abs

ClimateChat-300K: A Multi-Modal Facebook Dataset for Understanding Diverse Perspectives in Climate Communication
Wajdi Zaghouani | Md. Rafiul Biswas | Mabrouka Bessghaier | Shimaa Amer Ibrahim | George Mikros
Proceedings of the Fifteenth Language Resources and Evaluation Conference

We present ClimateChat-300K, a large-scale dataset of 299,329 public Facebook posts about climate change collected between May 2020 and May 2024 through the CrowdTangle platform. The dataset contains 41 metadata features including post content, engagement metrics, and page attributes, covering material from more than 26,000 global pages. Each post includes rich contextual information such as language, timestamp, page category, and interaction counts, enabling comprehensive analyses of public discourse around climate communication. Using topic modeling and sentiment analysis, we identify ten main themes grouped into five domains: policy, activism, cooperation, science, and conservation. The results reveal that emotional tone, post format, and page identity strongly influence audience engagement, with visually rich and emotionally charged content receiving the highest levels of interaction. The dataset also demonstrates how online discussions evolved in response to major events such as international climate summits and the COVID-19 pandemic period. ClimateChat-300K provides an open resource for reproducible and interdisciplinary research on polarization, misinformation, and the dynamics of digital climate discourse. By releasing this dataset, we aim to support transparent, data-driven research and contribute to a deeper understanding of how public engagement with climate issues develops across time, geography, and institutional contexts.

pdf bib abs

Large Language Models (LLMs) are increasingly used for Islamic question answering, where ungrounded responses may carry serious religious consequences. Yet standard MCQ/MRC-style evaluations do not capture key real-world failure modes, notably free-form hallucinations and the ability to abstain when evidence is insufficient. To address this gap, we introduce IslamicFaithQA, a 3,810-item bilingual (Arabic/English) **generative** benchmark with atomic single-gold answers, which enables direct measurement of hallucination and abstention. We additionally developed an end-to-end grounded Islamic modeling suite consisting of *(i)* 25K Arabic text-grounded SFT reasoning pairs, *(ii)* 5K bilingual preference samples for reward-guided alignment, and *(iii)* a verse-level Qur’an retrieval corpus of ∼6k atomic *verses* (ayat). Building on these resources, we develop an agentic Quran-grounding framework (agentic RAG) that uses structured tool calls for iterative evidence seeking and answer revision. Experiments across Arabic-centric and multilingual LLMs show that retrieval improves correctness and that agentic RAG yields the largest gains beyond standard RAG, achieving state-of-the-art performance and stronger Arabic–English robustness even with a small model (i.e., Qwen3 4B). We made the datasets are publicly available (https://huggingface.co/datasets/QCRI/IslamicFaithQA).

2025

pdf bib abs

This paper presents a comprehensive overview of the first edition of the Academic Essay Authenticity Challenge, organized as part of the GenAI Content Detection shared tasks collocated with COLING 2025. This challenge focuses on detecting machine-generated vs human-authored essays for academic purposes. The task is defined as follows: “Given an essay, identify whether it is generated by a machine or authored by a human.” The challenge involves two languages: English and Arabic. During the evaluation phase, 25 teams submitted systems for English and 21 teams for Arabic, reflecting substantial interest in the task. Finally, five teams submitted system description papers. The majority of submissions utilized fine-tuned transformer-based models, with one team employing Large Language Models (LLMs) such as Llama 2 and Llama 3. This paper outlines the task formulation, details the dataset construction process, and explains the evaluation framework. Additionally, we present a summary of the approaches adopted by participating teams. Nearly all submitted systems outperformed the n-gram-based baseline, with the top-performing systems achieving F1 scores exceeding 0.98 for both languages, indicating significant progress in the detection of machine-generated text.

pdf bib

pdf bib abs

This paper presents the MAHED 2025 Shared Task on Multimodal Detection of Hope and Hate Emotions in Arabic Content, comprising three subtasks: (1) text-based classification of Arabic content into hate and hope, (2) multi-task learning for joint prediction of emotions, offensive content, and hate speech, and (3) multimodal detection of hateful content in Arabic memes. We provide three high-quality datasets totaling over 22,000 instances sourced from social media platforms, annotated by native Arabic speakers with Cohen’s Kappa exceeding 0.85. Our evaluation attracted 46 leaderboard submissions from participants, with systems leveraging Arabic-specific pre-trained language models (AraBERT, MARBERT), large language models (GPT-4, Gemini), and multimodal fusion architectures combining CLIP vision encoders with Arabic text models. The best-performing systems achieved macro F1-scores of 0.723 (Task 1), 0.578 (Task 2), and 0.796 (Task 3), with top teams employing ensemble methods, class-weighted training, and OCR-aware multimodal fusion. Analysis reveals persistent challenges in dialectal robustness, minority class detection for hope speech, and highlights key directions for future Arabic content moderation research.

pdf bib abs

We present ImageEval 2025, the first shared task dedicated to Arabic image captioning. The task addresses the critical gap in multimodal Arabic NLP by focusing on two complementary subtasks: (1) creating the first open-source, manually-captioned Arabic image dataset through a collaborative datathon, and (2) developing and evaluating Arabic image captioning models. A total of 44 teams registered, of which eight submitted during the test phase, producing 111 valid submissions. Evaluation was conducted using automatic metrics, LLM-based judgment, and human assessment. In Subtask 1, the best-performing system achieved a cosine similarity of 65.5, while in Subtask 2, the top score was 60.0. Although these results show encouraging progress, they also confirm that Arabic image captioning remains a challenging task, particularly due to cultural grounding requirements, morphological richness, and dialectal variation. All datasets, baseline models, and evaluation tools are released publicly to support future research in Arabic multimodal NLP.

2024

pdf bib abs

Establishing Control Corpora for Depression Detection in Modern Greek: Methodological Insights
Vivian Stamou | George Mikros | George Markopoulos | Spyridoula Varlokosta
Proceedings of the Fifth Workshop on Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments @LREC-COLING 2024

This paper presents a methodological approach for establishing control corpora in the context of depression detection in the Modern Greek language. We discuss various methods used to create control corpora, focusing on the challenge of selecting representative samples from the general population when the target reference is the depressed population. Our approach includes traditional random selection among Twitter users, as well as an innovative method for creating topic-oriented control corpora. Through this study, we provide insights into the development of control corpora, offering valuable considerations for researchers working on similar projects in linguistic analysis and mental health studies. In addition, we identify several dominant topics in the depressed population such as religion, sentiments, health and digestion, which seem to align with findings consistently reported in the literature