Shivani Manivasagan
2025
Controllable Clustering with LLM-driven Embeddings
Kerria Pang-Naylor
|
Shivani Manivasagan
|
Aitong Zhong
|
Mehak Garg
|
Nicholas Mondello
|
Blake Buckner
|
Jonathan P. Chang
|
Khyati Mahajan
|
Masoud Hashemi
|
Fabio Casati
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Given the inherent subjectivity of similarity in text, fully unsupervised text clustering is unlikely to produce groupings that work across a variety of use cases. Traditional techniques to guide clustering rely on costly, time-consuming human feedback and/or pre-existing labels. Leveraging recent advancements in LLMs and decoder-only embedding models, we present techniques to effectively control text embeddings with minimal human input: prefix instructions and LLM preprocessing. We evaluate clustering performance for datasets with multiple independent ground-truth labels, or perspectives, and find that these techniques can be used to improve clustering for one perspective or use case, at the cost of a tradeoff in performance for another use case.
Search
Fix author
Co-authors
- Blake Buckner 1
- Fabio Casati 1
- Jonathan P. Chang 1
- Mehak Garg 1
- Masoud Hashemi 1
- show all...