Data Poisoning for In-context Learning
Pengfei He, Han Xu, Yue Xing, Hui Liu, Makoto Yamada, Jiliang Tang
Abstract
In-context learning (ICL) has emerged as a capability of large language models (LLMs), enabling them to adapt to new tasks using provided examples. While ICL has demonstrated its strong effectiveness, there is limited understanding of its vulnerability against potential threats. This paper examines ICL’s vulnerability to data poisoning attacks. We introduce ICLPoison, an attacking method specially designed to exploit ICL’s unique learning mechanisms by identifying discrete text perturbations that influence LLM hidden states. We propose three representative attack strategies, evaluated across various models and tasks. Our experiments, including those on GPT-4, show that ICL performance can be significantly compromised by these attacks, highlighting the urgent need for improved defense mechanisms to protect LLMs’ integrity and reliability.- Anthology ID:
- 2025.findings-naacl.91
- Volume:
- Findings of the Association for Computational Linguistics: NAACL 2025
- Month:
- April
- Year:
- 2025
- Address:
- Albuquerque, New Mexico
- Editors:
- Luis Chiruzzo, Alan Ritter, Lu Wang
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1680–1700
- Language:
- URL:
- https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.findings-naacl.91/
- DOI:
- Cite (ACL):
- Pengfei He, Han Xu, Yue Xing, Hui Liu, Makoto Yamada, and Jiliang Tang. 2025. Data Poisoning for In-context Learning. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 1680–1700, Albuquerque, New Mexico. Association for Computational Linguistics.
- Cite (Informal):
- Data Poisoning for In-context Learning (He et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.findings-naacl.91.pdf