Maria Lymperaiou

2024

pdf bib abs
Optimal and efficient text counterfactuals using Graph Neural Networks
Dimitris Lymperopoulos | Maria Lymperaiou | Giorgos Filandrianos | Giorgos Stamou
Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

As NLP models become increasingly integral to decision-making processes, the need for explainability and interpretability has become paramount. In this work, we propose a framework that achieves the aforementioned by generating semantically edited inputs, known as counterfactual interventions, which change the model prediction, thus providing a form of counterfactual explanations for the model. We frame the search for optimal counterfactual interventions as a graph assignment problem and employ a GNN to solve it, thus achieving high efficiency. We test our framework on two NLP tasks - binary sentiment classification and topic classification - and show that the generated edits are contrastive, fluent and minimal, while the whole process remains significantly faster than other state-of-the-art counterfactual editors.

pdf abs
Enhancing adversarial robustness in Natural Language Inference using explanations
Alexandros Koulakos | Maria Lymperaiou | Giorgos Filandrianos | Giorgos Stamou
Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

The surge of state-of-the-art transformer-based models has undoubtedly pushed the limits of NLP model performance, excelling in a variety of tasks. We cast the spotlight on the underexplored task of Natural Language Inference (NLI), since models trained on popular well-suited datasets are susceptible to adversarial attacks, allowing subtle input interventions to mislead the model. In this work, we validate the usage of natural language explanation as a model-agnostic defence strategy through extensive experimentation: only by fine-tuning a classifier on the explanation rather than premise-hypothesis inputs, robustness under various adversarial attacks is achieved in comparison to explanation-free baselines. Moreover, since there is no standard strategy for testing the semantic validity of the generated explanations, we research the correlation of widely used language generation metrics with human perception, in order for them to serve as a proxy towards robust NLI models. Our approach is resource-efficient and reproducible without significant computational limitations.

pdf abs
Puzzle Solving using Reasoning of Large Language Models: A Survey
Panagiotis Giadikiaroglou | Maria Lymperaiou | Giorgos Filandrianos | Giorgos Stamou
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Exploring the capabilities of Large Language Models (LLMs) in puzzle solving unveils critical insights into their potential and challenges in AI, marking a significant step towards understanding their applicability in complex reasoning tasks. This survey leverages a unique taxonomy—dividing puzzles into rule-based and rule-less categories—to critically assess LLMs through various methodologies, including prompting techniques, neuro-symbolic approaches, and fine-tuning. Through a critical review of relevant datasets and benchmarks, we assess LLMs’ performance, identifying significant challenges in complex puzzle scenarios. Our findings highlight the disparity between LLM capabilities and human-like reasoning, particularly in those requiring advanced logical inference. The survey underscores the necessity for novel strategies and richer datasets to advance LLMs’ puzzle-solving proficiency and contribute to AI’s logical reasoning and creative problem-solving advancements.

pdf abs
”I Never Said That”: A dataset, taxonomy and baselines on response clarity classification
Konstantinos Thomas | Giorgos Filandrianos | Maria Lymperaiou | Chrysoula Zerva | Giorgos Stamou
Findings of the Association for Computational Linguistics: EMNLP 2024

Equivocation and ambiguity in public speech are well-studied discourse phenomena, especially in political science and analysis of political interviews. Inspired by the well-grounded theory on equivocation, we aim to resolve the closely related problem of response clarity in questions extracted from political interviews, leveraging the capabilities of Large Language Models (LLMs) and human expertise. To this end, we introduce a novel taxonomy that frames the task of detecting and classifying response clarity and a corresponding clarity classification dataset which consists of question-answer (QA) pairs drawn from political interviews and annotated accordingly. Our proposed two-level taxonomy addresses the clarity of a response in terms of the information provided for a given question (high-level) and also provides a fine-grained taxonomy of evasion techniques that relate to unclear, ambiguous responses (lower-level). We combine ChatGPT and human annotators to collect, validate and annotate discrete QA pairs from political interviews, to be used for our newly introduced response clarity task. We provide a detailed analysis and conduct several experiments with different model architectures, sizes and adaptation methods to gain insights and establish new baselines over the proposed dataset and task.

pdf abs
AILS-NTUA at SemEval-2024 Task 6: Efficient model tuning for hallucination detection and analysis
Natalia Grigoriadou | Maria Lymperaiou | George Filandrianos | Giorgos Stamou
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

In this paper, we present our team’s submissions for SemEval-2024 Task-6 - SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes. The participants were asked to perform binary classification to identify cases of fluent overgeneration hallucinations. Our experimentation included fine-tuning a pre-trained model on hallucination detection and a Natural Language Inference (NLI) model. The most successful strategy involved creating an ensemble of these models, resulting in accuracy rates of 77.8% and 79.9% on model-agnostic and model-aware datasets respectively, outperforming the organizers’ baseline and achieving notable results when contrasted with the top-performing results in the competition, which reported accuracies of 84.7% and 81.3% correspondingly.

pdf abs
AILS-NTUA at SemEval-2024 Task 9: Cracking Brain Teasers: Transformer Models for Lateral Thinking Puzzles
Ioannis Panagiotopoulos | George Filandrianos | Maria Lymperaiou | Giorgos Stamou
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

In this paper, we outline our submission for the SemEval-2024 Task 9 competition: ‘BRAINTEASER: A Novel Task Defying Common Sense’. We engage in both sub-tasks: Sub-task A-Sentence Puzzle and Sub-task B-Word Puzzle. We evaluate a plethora of pre-trained transformer-based language models of different sizes through fine-tuning. Subsequently, we undertake an analysis of their scores and responses to aid future researchers in understanding and utilizing these models effectively. Our top-performing approaches secured competitive positions on the competition leaderboard across both sub-tasks. In the evaluation phase, our best submission attained an average accuracy score of 81.7% in the Sentence Puzzle, and 85.4% in the Word Puzzle, significantly outperforming the best neural baseline (ChatGPT) by more than 20% and 30% respectively.

2023

pdf abs
Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation
Anastasia Kritharoula | Maria Lymperaiou | Giorgos Stamou
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Visual Word Sense Disambiguation (VWSD) is a novel challenging task with the goal of retrieving an image among a set of candidates, which better represents the meaning of an ambiguous word within a given context. In this paper, we make a substantial step towards unveiling this interesting task by applying a varying set of approaches. Since VWSD is primarily a text-image retrieval task, we explore the latest transformer-based methods for multimodal retrieval. Additionally, we utilize Large Language Models (LLMs) as knowledge bases to enhance the given phrases and resolve ambiguity related to the target word. We also study VWSD as a unimodal problem by converting to text-to-text and image-to-image retrieval, as well as question-answering (QA), to fully explore the capabilities of relevant models. To tap into the implicit knowledge of LLMs, we experiment with Chain-of-Thought (CoT) prompting to guide explainable answer generation. On top of all, we train a learn to rank (LTR) model in order to combine our different modules, achieving competitive ranking results. Extensive experiments on VWSD demonstrate valuable insights to effectively drive future directions.

2022

pdf abs
Towards Explainable Evaluation of Language Models on the Semantic Similarity of Visual Concepts
Maria Lymperaiou | George Manoliadis | Orfeas Menis Mastromichalakis | Edmund G. Dervakos | Giorgos Stamou
Proceedings of the 29th International Conference on Computational Linguistics

Recent breakthroughs in NLP research, such as the advent of Transformer models have indisputably contributed to major advancements in several tasks. However, few works research robustness and explainability issues of their evaluation strategies. In this work, we examine the behavior of high-performing pre-trained language models, focusing on the task of semantic similarity for visual vocabularies. First, we address the need for explainable evaluation metrics, necessary for understanding the conceptual quality of retrieved instances. Our proposed metrics provide valuable insights in local and global level, showcasing the inabilities of widely used approaches. Secondly, adversarial interventions on salient query semantics expose vulnerabilities of opaque metrics and highlight patterns in learned linguistic representations.