Patrick Altmeyer


2025

pdf bib
Natural Language Counterfactual Explanations in Financial Text Classification: A Comparison of Generators and Evaluation Metrics
Karol Dobiczek | Patrick Altmeyer | Cynthia C. S. Liem
Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)

The use of large language model (LLM) classifiers in finance and other high-stakes domains calls for a high level of trustworthiness and explainability. We focus on counterfactual explanations (CE), a form of explainable AI that explains a model’s output by proposing an alternative to the original input that changes the classification. We use three types of CE generators for LLM classifiers and assess the quality of their explanations on a recent dataset consisting of central bank communications. We compare the generators using a selection of quantitative and qualitative metrics. Our findings suggest that non-expert and expert evaluators prefer CE methods that apply minimal changes; however, the methods we analyze might not handle the domain-specific vocabulary well enough to generate plausible explanations. We discuss shortcomings in the choice of evaluation metrics in the literature on text CE generators and propose refined definitions of the fluency and plausibility qualitative metrics.

2024

pdf bib
Conformal Intent Classification and Clarification for Fast and Accurate Intent Recognition
Floris den Hengst | Ralf Wolter | Patrick Altmeyer | Arda Kaygan
Findings of the Association for Computational Linguistics: NAACL 2024

We present Conformal Intent Classification and Clarification (CICC), a framework for fast and accurate intent classification for task-oriented dialogue systems. The framework turns heuristic uncertainty scores of any intent classifier into a clarification question that is guaranteed to contain the true intent at a pre-defined confidence level.By disambiguating between a small number of likely intents, the user query can be resolved quickly and accurately. Additionally, we propose to augment the framework for out-of-scope detection.In a comparative evaluation using seven intent recognition datasets we find that CICC generates small clarification questions and is capable of out-of-scope detection.CICC can help practitioners and researchers substantially in improving the user experience of dialogue agents with specific clarification questions.