Chenghao Sun
2025
Interpret and Improve In-Context Learning via the Lens of Input-Label Mappings
Chenghao Sun
|
Zhen Huang
|
Yonggang Zhang
|
Le Lu
|
Houqiang Li
|
Xinmei Tian
|
Xu Shen
|
Jieping Ye
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) excel at downstream NLP tasks through in-context learning (ICL) with a few demonstrations of input–label pairs. However, the internal mechanisms behind ICL remain under-explored, particularly the mappings between inputs and labels. In this work, we reverse-engineer ICL by examining input-label mappings: what they are within LLMs, where they function, and how LLMs utilize them. (1) what: We discover input-label mappings stored within a few specific layers in the form of principal components (PCs), which capture human-interpretable and task-related words. (2) where: We propose a PC patching approach to identify the modules where input-label mappings function. Specifically, PC patching automatically crafts counterfactual representations using identified semantic PCs, rather than manually designing counterfactual text, to suppress the behavior related to LLM capability for ICL-related modules. Utilizing PC patching, we identify LLMs apply input-label mappings in a small fraction of attention heads. (3) how: We observe and verify that the identified key heads utilize input-label mappings from demonstrations to generate target labels for new queries. Based on these discoveries, we further show that precisely fine-tuning key ICL-related modules leads to significant improvements across diverse tasks.
Search
Fix author
Co-authors
- Zhen Huang 1
- Houqiang Li 1
- Le Lu 1
- Xu Shen 1
- Xinmei Tian 1
- show all...
Venues
- acl1