Athanasios Voulodimos


2025

pdf bib
Pitfalls of Scale: Investigating the Inverse Task of Redefinition in Large Language Models
Elena Stringli | Maria Lymperaiou | Giorgos Filandrianos | Athanasios Voulodimos | Giorgos Stamou
Findings of the Association for Computational Linguistics: ACL 2025

Inverse tasks can uncover potential reasoning gaps as Large Language Models (LLMs) scale up. In this work, we explore the redefinition task, in which we assign alternative values to well-known physical constants and units of measure, prompting LLMs to respond accordingly. Our findings show that not only does model performance degrade with scale, but its false confidence also rises. Moreover, while factors such as prompting strategies or response formatting are influential, they do not preclude LLMs from anchoring to memorized values.

pdf bib
AILS-NTUA at SemEval-2025 Task 3: Leveraging Large Language Models and Translation Strategies for Multilingual Hallucination Detection
Dimitra Karkani | Maria Lymperaiou | George Filandrianos | Nikolaos Spanos | Athanasios Voulodimos | Giorgos Stamou
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Multilingual hallucination detection stands as an underexplored challenge, which the Mu-SHROOM shared task seeks to address. In this work, we propose an efficient, training-free LLM prompting strategy that enhances detection by translating multilingual text spans into English. Our approach achieves competitive rankings across multiple languages, securing two first positions in low-resource languages. The consistency of our results highlights the effectiveness of our translation strategy for hallucination detection, demonstrating its applicability regardless of the source language.

pdf bib
AILS-NTUA at SemEval-2025 Task 4: Parameter-Efficient Unlearning for Large Language Models using Data Chunking
Iraklis Premptis | Maria Lymperaiou | George Filandrianos | Orfeas Menis Mastromichalakis | Athanasios Voulodimos | Giorgos Stamou
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

The {textit{Unlearning Sensitive Content from Large Language Models}} task aims to remove targeted datapoints from trained models while minimally affecting their general knowledge. In our work, we leverage parameter-efficient, gradient-based unlearning using low-rank (LoRA) adaptation and layer-focused fine-tuning. To further enhance unlearning effectiveness, we employ data chunking, splitting forget data into disjoint partitions and merging them with cyclically sampled retain samples at a pre-defined ratio. Our task-agnostic method achieves an outstanding forget-retain balance, ranking first on leaderboards and significantly outperforming baselines and competing systems.

pdf bib
AILS-NTUA at SemEval-2025 Task 8: Language-to-Code prompting and Error Fixing for Tabular Question Answering
Andreas Evangelatos | George Filandrianos | Maria Lymperaiou | Athanasios Voulodimos | Giorgos Stamou
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

In this paper, we present our submission to SemEval-2025 Task 8: Question Answering over Tabular Data. This task, evaluated on the DataBench dataset, assesses Large Language Models’ (LLMs) ability to answer natural language questions over structured data while addressing topic diversity and table size limitations in previous benchmarks. We propose a system that employs effective LLM prompting to translate natural language queries into executable code, enabling accurate responses, error correction, and interpretability. Our approach ranks first in both subtasks of the competition in the proprietary model category, significantly outperforming the organizer’s baseline.