CoverICL: Selective Annotation for In-Context Learning via Active Graph Coverage
Costas Mavromatis, Balasubramaniam Srinivasan, Zhengyuan Shen, Jiani Zhang, Huzefa Rangwala, Christos Faloutsos, George Karypis
Abstract
In-context learning (ICL) adapts Large Language Models (LLMs) to new tasks, without requiring any parameter updates, but few annotated examples as input. In this work, we investigate selective annotation for ICL, where there is a limited budget for annotating examples, similar to low-budget active learning (AL). Although uncertainty-based selection is unreliable with few annotated data, we present CoverICL, an adaptive graph-based selection algorithm, that effectively incorporates uncertainty sampling into selective annotation for ICL. First, CoverICL builds a nearest-neighbor graph based on the semantic similarity between candidate ICL examples. Then, CoverICL employs uncertainty estimation by the LLM to identify hard examples for the task. Selective annotation is performed over the active graph of the hard examples, adapting the process to the particular LLM used and the task tackled. CoverICL selects the most representative examples by solving a Maximum Coverage problem, approximating diversity-based sampling. Extensive experiments on ten datasets and seven LLMs show that, by incorporating uncertainty via coverage on the active graph, CoverICL (1) outperforms existing AL methods for ICL by 2–4.6% accuracy points, (2) is up to 2x more budget-efficient than SOTA methods for low-budget AL, and (3) generalizes better across tasks compared to non-graph alternatives.- Anthology ID:
- 2024.emnlp-main.1185
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 21268–21286
- Language:
- URL:
- https://aclanthology.org/2024.emnlp-main.1185
- DOI:
- 10.18653/v1/2024.emnlp-main.1185
- Cite (ACL):
- Costas Mavromatis, Balasubramaniam Srinivasan, Zhengyuan Shen, Jiani Zhang, Huzefa Rangwala, Christos Faloutsos, and George Karypis. 2024. CoverICL: Selective Annotation for In-Context Learning via Active Graph Coverage. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 21268–21286, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- CoverICL: Selective Annotation for In-Context Learning via Active Graph Coverage (Mavromatis et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.emnlp-main.1185.pdf