@inproceedings{li-etal-2025-co,
    title = "Co-Evolving {LLM}s and Embedding Models via Density-Guided Preference Optimization for Text Clustering",
    author = "Li, Zetong  and
      Su, Qinliang  and
      Huang, Minhua  and
      Yang, Yin",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.241/",
    pages = "4796--4808",
    ISBN = "979-8-89176-332-6",
    abstract = "Large language models (LLMs) have shown strong potential in enhancing text clustering when combined with traditional embedding models. However, existing methods predominantly treat LLMs as static pseudo-oracles, i.e., unidirectionally querying them for similarity assessment or data augmentation, while never seeking feedback from embedding models to improve them. In this work, we propose a training framework that enables bidirectional refinement between LLMs and embedding models. We first design task-aware prompts to guide the LLM in generating interpretations for the input texts. These interpretations are projected into the embedding space, in which interpretations that are preferred by the embedding model are selected based on their distribution densities. The selected interpretations are then used to fine-tune the LLM via preference optimization to prioritize the generation of helpful interpretations. Meanwhile, we enhance the embedding model via contrastive learning on the generated interpretations and perform clustering on the output embeddings, leading to iterative co-training between the LLM and the embedding model. Experiments on 14 benchmark datasets across 5 tasks demonstrate the effectiveness of our method."
}Markdown (Informal)
[Co-Evolving LLMs and Embedding Models via Density-Guided Preference Optimization for Text Clustering](https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.241/) (Li et al., EMNLP 2025)
ACL