What Makes Good Instruction-Tuning Data? An In-Context Learning Perspective

Guangzeng Han, Xiaolei Huang


Abstract
Instruction-tuning datasets often contain substantial redundancy and low-quality samples, necessitating effective data selection methods. We propose an instruction data selection framework based on weighted in-context influence (wICI), which measures how effectively each candidate example reduces instruction-following difficulty for semantically related peers. Through systematic experiments, we address three key questions: what constitutes effective instruction tuning data from an in-context perspective, whether sample difficulty correlates with in-context influence, and how in-context influence translates to instruction tuning effectiveness. Experiments across multiple models and benchmarks demonstrate that our method consistently outperforms existing baselines under constrained data budgets, while empirically showing that sample difficulty negatively correlates with in-context influence.
Anthology ID:
2026.acl-long.45
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1013–1027
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.45/
DOI:
Bibkey:
Cite (ACL):
Guangzeng Han and Xiaolei Huang. 2026. What Makes Good Instruction-Tuning Data? An In-Context Learning Perspective. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1013–1027, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
What Makes Good Instruction-Tuning Data? An In-Context Learning Perspective (Han & Huang, ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.45.pdf
Checklist:
 2026.acl-long.45.checklist.pdf