IAD: In-Context Learning Ability Decoupler of Large Language Models in Meta-Training

Yuhan Liu, Xiuying Chen, Gao Xing, Ji Zhang, Rui Yan


Abstract
Large Language Models (LLMs) exhibit remarkable In-Context Learning (ICL) ability, where the model learns tasks from prompts consisting of input-output examples. However, the pre-training objectives of LLMs often misalign with ICL objectives. They’re mainly pre-trained with methods like masked language modeling and next-sentence prediction. On the other hand, ICL leverages example pairs to guide the model in generating task-aware responses such as text classification and question-answering tasks. The basic pre-training task-related capabilities can sometimes overshadow or conflict with task-specific subtleties required in ICL. To address this, we propose an In-context learning Ability Decoupler (IAD). The model aims to separate the ICL ability from the general ability of LLMs in the meta-training phase, where the ICL-related parameters are separately tuned to adapt for ICL tasks. Concretely, we first identify the parameters that are suitable for ICL by transference-driven gradient importance. We then propose a new max-margin loss to emphasize the separation of the general and ICL abilities. The loss is defined as the difference between the output of ICL and the original LLM, aiming to prevent the overconfidence of the LLM. By meta-training these ICL-related parameters with max-margin loss, we enable the model to learn and adapt to new tasks with limited data effectively. Experimental results show that IAD’s capability yields state-of-the-art performance on benchmark datasets by utilizing only 30% of the model’s parameters. Ablation study and detailed analysis prove the separation of the two abilities.
Anthology ID:
2024.lrec-main.749
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
8535–8545
Language:
URL:
https://aclanthology.org/2024.lrec-main.749
DOI:
Bibkey:
Cite (ACL):
Yuhan Liu, Xiuying Chen, Gao Xing, Ji Zhang, and Rui Yan. 2024. IAD: In-Context Learning Ability Decoupler of Large Language Models in Meta-Training. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 8535–8545, Torino, Italia. ELRA and ICCL.
Cite (Informal):
IAD: In-Context Learning Ability Decoupler of Large Language Models in Meta-Training (Liu et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2024.lrec-main.749.pdf