Co-Evolving LLMs and Embedding Models via Density-Guided Preference Optimization for Text Clustering

Zetong Li, Qinliang Su, Minhua Huang, Yin Yang


Abstract
Large language models (LLMs) have shown strong potential in enhancing text clustering when combined with traditional embedding models. However, existing methods predominantly treat LLMs as static pseudo-oracles, i.e., unidirectionally querying them for similarity assessment or data augmentation, while never seeking feedback from embedding models to improve them. In this work, we propose a training framework that enables bidirectional refinement between LLMs and embedding models. We first design task-aware prompts to guide the LLM in generating interpretations for the input texts. These interpretations are projected into the embedding space, in which interpretations that are preferred by the embedding model are selected based on their distribution densities. The selected interpretations are then used to fine-tune the LLM via preference optimization to prioritize the generation of helpful interpretations. Meanwhile, we enhance the embedding model via contrastive learning on the generated interpretations and perform clustering on the output embeddings, leading to iterative co-training between the LLM and the embedding model. Experiments on 14 benchmark datasets across 5 tasks demonstrate the effectiveness of our method.
Anthology ID:
2025.emnlp-main.241
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4796–4808
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.241/
DOI:
Bibkey:
Cite (ACL):
Zetong Li, Qinliang Su, Minhua Huang, and Yin Yang. 2025. Co-Evolving LLMs and Embedding Models via Density-Guided Preference Optimization for Text Clustering. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 4796–4808, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Co-Evolving LLMs and Embedding Models via Density-Guided Preference Optimization for Text Clustering (Li et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.241.pdf
Checklist:
 2025.emnlp-main.241.checklist.pdf