Mahdi Dhaini

2025

pdf bib abs
Facts Fade Fast: Evaluating Memorization of Outdated Medical Knowledge in Large Language Models
Juraj Vladika | Mahdi Dhaini | Florian Matthes
Findings of the Association for Computational Linguistics: EMNLP 2025

The growing capabilities of Large Language Models (LLMs) can enhance healthcare by assisting medical researchers, physicians, and improving access to health services for patients. LLMs encode extensive knowledge within their parameters, including medical knowledge derived from many sources. However, the knowledge in LLMs can become outdated over time, posing challenges in keeping up with evolving medical recommendations and research. This can lead to LLMs providing outdated health advice or failures in medical reasoning tasks. To address this gap, our study introduces two novel biomedical question-answering (QA) datasets derived from medical systematic literature reviews: MedRevQA, a general dataset of 16,501 biomedical QA pairs, and MedChangeQA, a subset of 512 QA pairs whose verdict changed though time. By evaluating the performance of eight popular LLMs, we find that all models exhibit memorization of outdated knowledge to some extent. We provide deeper insights and analysis, paving the way for future research on this challenging aspect of LLMs.

pdf bib abs
From Regulation to Interaction: Expert Views on Aligning Explainable AI with the EU AI Act
Mahdi Dhaini | Lukas Ondrus | Gjergji Kasneci
Proceedings of the Fourth Workshop on Bridging Human-Computer Interaction and Natural Language Processing (HCI+NLP)

Explainable AI (XAI) aims to support people who interact with high-stakes AI-driven decisions, and the EU AI Act mandates that users must be able to interpret system outputs appropriately. Although the Act requires users to interpret outputs and mandates human oversight, it offers no technical guidance for implementing explainability, leaving interpretability methods opaque to non-experts and compliance obligations unclear. To address these gaps, we interviewed eight experts to explore (1) how explainability is defined and perceived under the Act, (2) the practical and regulatory obstacles to XAI implementation, and (3) recommended solutions and future directions. Our findings reveal that experts view explainability as context- and audience-dependent, face challenges from regulatory vagueness and technical trade-offs, and advocate for domain-specific rules, hybrid methods, and user-centered explanations. These insights provide a basis for a potential framework to align XAI methods—particularly for AI and Natural Language Processing (NLP) systems—with regulatory requirements, and suggest actionable steps for policymakers and practitioners

2024

pdf bib abs
Explainability Meets Text Summarization: A Survey
Mahdi Dhaini | Ege Erdogan | Smarth Bakshi | Gjergji Kasneci
Proceedings of the 17th International Natural Language Generation Conference

Summarizing long pieces of text is a principal task in natural language processing with Machine Learning-based text generation models such as Large Language Models (LLM) being particularly suited to it. Yet these models are often used as black-boxes, making them hard to interpret and debug. This has led to calls by practitioners and regulatory bodies to improve the explainability of such models as they find ever more practical use. In this survey, we present a dual-perspective review of the intersection between explainability and summarization by reviewing the current state of explainable text summarization and also highlighting how summarization techniques are effectively employed to improve explanations.

2023

pdf bib abs
Detecting ChatGPT: A Survey of the State of Detecting ChatGPT-Generated Text
Mahdi Dhaini | Wessel Poelman | Ege Erdogan
Proceedings of the 8th Student Research Workshop associated with the International Conference Recent Advances in Natural Language Processing

While recent advancements in the capabilities and widespread accessibility of generative language models, such as ChatGPT (OpenAI, 2022), have brought about various benefits by generating fluent human-like text, the task of distinguishing between human- and large language model (LLM) generated text has emerged as a crucial problem. These models can potentially deceive by generating artificial text that appears to be human-generated. This issue is particularly significant in domains such as law, education, and science, where ensuring the integrity of text is of the utmost importance. This survey provides an overview of the current approaches employed to differentiate between texts generated by humans and ChatGPT. We present an account of the different datasets constructed for detecting ChatGPT-generated text, the various methods utilized, what qualitative analyses into the characteristics of human versus ChatGPT-generated text have been performed, and finally, summarize our findings into general insights.

Co-authors

Wessel Poelman 1

Juraj Vladika 1

Venues

Fix data