Saeb Ganideh

2026

LLM-induced Rationales for More Compact Explainable Style Classification Models
Ahmad Aljanaideh | Saeb Ganideh
Findings of the Association for Computational Linguistics: ACL 2026

The complexity of recent natural language classification models led to interest in developing methods for improving the performance of explainable models (e.g. Logistic Regression). Existing methods focus on clustering word embeddings to discover fine-grained contextual features that can be used to train a linear model. While those methods help reduce the gap in performance between black-box models and explainable models, they are based on discovering a large number of features, and this affects interpretability. In this work, we propose a model that leverages Large Language Models (LLMs) and clustering algorithms to discover a compact set of interpretable features. The proposed model first uses GPT-4o mini to extract rationales (i.e. phrases which explain an item’s label) from labeled text, and then clusters those rationales to obtain a compact, interpretable feature space. Across 3 Style Classification tasks, the resulting features achieve comparable performance to word-cluster baselines on most tasks, while reducing the number of features by 85–99%. These results highlight the potential of LLMs to improve the compactness of explainable AI models.

Co-authors

Ahmad Aljanaideh 1

Venues

Findings1

Fix author