RRInf: Efficient Influence Function Estimation via Ridge Regression for Large Language Models and Text-to-Image Diffusion Models

Zhuozhuo Tu, Cheng Chen, Yuxuan Du


Abstract
The quality of data plays a vital role in the development of Large-scale Generative Models. Understanding how important a data point is for a generative model is essential for explaining its behavior and improving the performance. The influence function provides a framework for quantifying the impact of individual training data on model predictions. However, the high computational cost has hindered their applicability in large-scale applications. In this work, we present RRInf, a novel and principled method for estimating influence function in large-scale generative AI models. We show that influence function estimation can be transformed into a ridge regression problem. Based on this insight, we develop an algorithm that is efficient and scalable to large models. Experiments on noisy data detection and influential data identification tasks demonstrate that RRInf outperforms existing methods in terms of both efficiency and effectiveness for commonly used large models: RoBERTa-large, Llama-2-13B-chat, Llama-3-8B and stable-diffusion-v1.5.
Anthology ID:
2025.emnlp-main.933
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18505–18518
Language:
URL:
https://preview.aclanthology.org/ingest-luhme/2025.emnlp-main.933/
DOI:
10.18653/v1/2025.emnlp-main.933
Bibkey:
Cite (ACL):
Zhuozhuo Tu, Cheng Chen, and Yuxuan Du. 2025. RRInf: Efficient Influence Function Estimation via Ridge Regression for Large Language Models and Text-to-Image Diffusion Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 18505–18518, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
RRInf: Efficient Influence Function Estimation via Ridge Regression for Large Language Models and Text-to-Image Diffusion Models (Tu et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-luhme/2025.emnlp-main.933.pdf
Checklist:
 2025.emnlp-main.933.checklist.pdf