Orfeas Menis Mastromichalakis


2023

pdf
Counterfactuals of Counterfactuals: a back-translation-inspired approach to analyse counterfactual editors
George Filandrianos | Edmund Dervakos | Orfeas Menis Mastromichalakis | Chrysoula Zerva | Giorgos Stamou
Findings of the Association for Computational Linguistics: ACL 2023

In the wake of responsible AI, interpretability methods, which attempt to provide an explanation for the predictions of neural models have seen rapid progress. In this work, we are concerned with explanations that are applicable to natural language processing (NLP) models and tasks, and we focus specifically on the analysis of counterfactual, contrastive explanations. We note that while there have been several explainers proposed to produce counterfactual explanations, their behaviour can vary significantly and the lack of a universal ground truth for the counterfactual edits imposes an insuperable barrier on their evaluation. We propose a new back translation-inspired evaluation methodology that utilises earlier outputs of the explainer as ground truth proxies to investigate the consistency of explainers. We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models, and infer patterns that would be otherwise obscured. Using this methodology, we conduct a thorough analysis and propose a novel metric to evaluate the consistency of counterfactual generation approaches with different characteristics across available performance indicators.

2022

pdf
Towards Explainable Evaluation of Language Models on the Semantic Similarity of Visual Concepts
Maria Lymperaiou | George Manoliadis | Orfeas Menis Mastromichalakis | Edmund G. Dervakos | Giorgos Stamou
Proceedings of the 29th International Conference on Computational Linguistics

Recent breakthroughs in NLP research, such as the advent of Transformer models have indisputably contributed to major advancements in several tasks. However, few works research robustness and explainability issues of their evaluation strategies. In this work, we examine the behavior of high-performing pre-trained language models, focusing on the task of semantic similarity for visual vocabularies. First, we address the need for explainable evaluation metrics, necessary for understanding the conceptual quality of retrieved instances. Our proposed metrics provide valuable insights in local and global level, showcasing the inabilities of widely used approaches. Secondly, adversarial interventions on salient query semantics expose vulnerabilities of opaque metrics and highlight patterns in learned linguistic representations.