Data Descriptions from Large Language Models with Influence Estimation

Chaeri Kim, Jaeyeon Bae, Taehwan Kim


Abstract
Deep learning models have been successful in many areas, but understanding their behavior remains a challenge. Most prior explainable AI (XAI) approaches have focused on interpreting how models make predictions. In contrast, we introduce a novel approach that identifies textual descriptions most beneficial for model training. By analyzing which descriptions contribute most effectively to the model training, our method has the potential to provide insights into how the model prioritizes and utilizes information for decision-making. To achieve this, we propose a pipeline that generates textual descriptions using large language models, incorporates external knowledge bases, and refines them through influence estimation and CLIP score. Furthermore, leveraging the phenomenon of cross-modal transferability, we propose a novel benchmark task named cross-modal transfer classification to examine the effectiveness of our textual descriptions. In zero-shot experiments, we demonstrate that our textual descriptions improve classification accuracy compared to baselines, leading to consistent performance gains across nine image classification datasets. Additionally, understanding which descriptions contribute most to model performance can shed light on how the model utilizes textual information in its decision-making.
Anthology ID:
2025.emnlp-main.1717
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
33838–33855
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1717/
DOI:
Bibkey:
Cite (ACL):
Chaeri Kim, Jaeyeon Bae, and Taehwan Kim. 2025. Data Descriptions from Large Language Models with Influence Estimation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 33838–33855, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Data Descriptions from Large Language Models with Influence Estimation (Kim et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1717.pdf
Checklist:
 2025.emnlp-main.1717.checklist.pdf