The Good, the Bad, and the Debatable: A Survey on the Impacts of Data for In-Context Learning

Stephanie Schoch; Yangfeng Ji

The Good, the Bad, and the Debatable: A Survey on the Impacts of Data for In-Context Learning

Abstract

In-context learning is an emergent learning paradigm that enables an LLM to learn an unseen task by seeing a number of demonstrations in the context window. The quality of the demonstrations is of paramount importance as 1) context window size limitations restrict the number of demonstrations that can be presented to the model, and 2) the model must identify the task and potentially learn new, unseen input-output mappings from the limited demonstration set. An increasing body of work has also shown the sensitivity of predictions to perturbations on the demonstration set. Given this importance, this work presents a survey on the current literature pertaining to the relationship between data and in-context learning. We present our survey in three parts: the “good” – qualities that are desirable when selecting demonstrations, the “bad” – qualities of demonstrations that can negatively impact the model, as well as issues that can arise in presenting demonstrations, and the “debatable” – qualities of demonstrations with mixed results or factors modulating data impacts.

Anthology ID:: 2025.emnlp-main.1514
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 29786–29800
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1514/
DOI:
Bibkey:
Cite (ACL):: Stephanie Schoch and Yangfeng Ji. 2025. The Good, the Bad, and the Debatable: A Survey on the Impacts of Data for In-Context Learning. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 29786–29800, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: The Good, the Bad, and the Debatable: A Survey on the Impacts of Data for In-Context Learning (Schoch & Ji, EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1514.pdf
Checklist:: 2025.emnlp-main.1514.checklist.pdf

PDF Cite Search Checklist Fix data