Embedding-based In-Context Prompt Training for Enhancing LLMs as Text Encoders

Ailiang Lin, Zhuoyun Li, Keyu Mao, Kotaro Funakoshi, Manabu Okumura


Abstract
Large language models (LLMs) have been widely explored for embedding generation. While recent studies show that in-context learning (ICL) effectively enhances the representational capability of LLMs by prepending a few task-related demonstrations, it causes substantial token overhead due to the increased sequence length. In this work, we propose EPIC, a novel embedding-based in-context prompt training strategy that leverages ICL to generate high-quality embeddings while reducing computational burden during both training and inference. This approach replaces discrete text demonstrations with their corresponding continuous embeddings, which not only encourages the LLM to align semantically-related text pairs during contrastive learning, but also requires the model to interpret demonstration embeddings as part of the in-context prompt. Consequently, EPIC-trained models achieve excellent embedding performance both with or without in-context prompts at inference time. Comprehensive experiments demonstrate that our method establishes new state-of-the-art results on the MTEB benchmark, surpassing frontier models trained solely on publicly available retrieval data. Extensive ablation studies further validate the effectiveness and necessity of our mechanism.
Anthology ID:
2026.findings-acl.1454
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
29079–29095
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1454/
DOI:
Bibkey:
Cite (ACL):
Ailiang Lin, Zhuoyun Li, Keyu Mao, Kotaro Funakoshi, and Manabu Okumura. 2026. Embedding-based In-Context Prompt Training for Enhancing LLMs as Text Encoders. In Findings of the Association for Computational Linguistics: ACL 2026, pages 29079–29095, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Embedding-based In-Context Prompt Training for Enhancing LLMs as Text Encoders (Lin et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1454.pdf
Checklist:
 2026.findings-acl.1454.checklist.pdf