KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs

Yixuan Tang, Yi Yang


Abstract
While LLMs are powerful embedding backbones, their application in training-free settings faces two structural challenges: causal attention restricts early tokens from accessing subsequent context, and the next-token prediction objective biases representations toward generation rather than semantic compression. To address these limitations, we propose KV-Embedding, a framework that activates the latent representation power of frozen LLMs. Our method leverages the observation that the key-value (KV) states of the final token at each layer encode a compressed view of the sequence. By re-routing these states as a prepended prefix, we enable all tokens to access sequence-level context within a single forward pass. To ensure model-agnostic applicability, we introduce an automated layer selection strategy based on intrinsic dimensionality. Evaluations on MTEB across Qwen, Mistral, and Llama backbones show that KV-Embedding outperforms existing training-free baselines by up to 10%, while maintaining robust performance on sequences up to 4,096 tokens. These results demonstrate that internal state manipulation offers an efficient alternative to input modification, and we hope this work encourages further exploration of LLM internals for representation learning.
Anthology ID:
2026.acl-long.540
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11773–11794
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.540/
DOI:
Bibkey:
Cite (ACL):
Yixuan Tang and Yi Yang. 2026. KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11773–11794, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs (Tang & Yang, ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.540.pdf
Checklist:
 2026.acl-long.540.checklist.pdf