Marvin Limpijankit
2025
Counterfactual Simulatability of LLM Explanations for Generation Tasks
Marvin Limpijankit
|
Yanda Chen
|
Melanie Subbiah
|
Nicholas Deas
|
Kathleen McKeown
Proceedings of the 18th International Natural Language Generation Conference
LLMs can be unpredictable, as even slight alterations to the prompt can cause the output to change in unexpected ways. Thus, the ability of models to accurately explain their behavior is critical, especially in high-stakes settings. Counterfactual simulatability measures how well an explanation allows users to infer the model’s output on related counterfactuals and has been previously studied for yes/no question answering. We provide a general framework for extending this method to generation tasks, using news summarization and medical suggestion as example use cases. We find that while LLM explanations do enable users to better predict their outputs on counterfactuals in the summarization setting, there is significant room for improvement for medical suggestion. Furthermore, our results suggest that evaluating counterfactual simulatability may be more appropriate for skill-based tasks as opposed to knowledge-based tasks.