Data Poisoning for In-context Learning

Pengfei He, Han Xu, Yue Xing, Hui Liu, Makoto Yamada, Jiliang Tang


Abstract
In-context learning (ICL) has emerged as a capability of large language models (LLMs), enabling them to adapt to new tasks using provided examples. While ICL has demonstrated its strong effectiveness, there is limited understanding of its vulnerability against potential threats. This paper examines ICL’s vulnerability to data poisoning attacks. We introduce ICLPoison, an attacking method specially designed to exploit ICL’s unique learning mechanisms by identifying discrete text perturbations that influence LLM hidden states. We propose three representative attack strategies, evaluated across various models and tasks. Our experiments, including those on GPT-4, show that ICL performance can be significantly compromised by these attacks, highlighting the urgent need for improved defense mechanisms to protect LLMs’ integrity and reliability.
Anthology ID:
2025.findings-naacl.91
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1680–1700
Language:
URL:
https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.findings-naacl.91/
DOI:
Bibkey:
Cite (ACL):
Pengfei He, Han Xu, Yue Xing, Hui Liu, Makoto Yamada, and Jiliang Tang. 2025. Data Poisoning for In-context Learning. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 1680–1700, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Data Poisoning for In-context Learning (He et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.findings-naacl.91.pdf