ZeroDL: Zero-shot Distribution Learning for Text Clustering via Large Language Models

Hwiyeol Jo; Hyunwoo Lee; Kang Min Yoo; Taiwoo Park

ZeroDL: Zero-shot Distribution Learning for Text Clustering via Large Language Models

Hwiyeol Jo, Hyunwoo Lee, Kang Min Yoo, Taiwoo Park

Abstract

The advancements in large language models (LLMs) have brought significant progress in NLP tasks. However, if a task cannot be fully described in prompts, the models could fail to carry out the task. In this paper, we propose a simple yet effective method to contextualize a task toward a LLM. The method utilizes (1) open-ended zero-shot inference from the entire dataset, (2) aggregate the inference results, and (3) finally incorporate the aggregated meta-information for the actual task. We show the effectiveness in text clustering tasks, empowering LLMs to perform text-to-text-based clustering and leading to improvements on several datasets. Furthermore, we explore the generated class labels for clustering, showing how the LLM understands the task through data.

Anthology ID:: 2025.findings-acl.1005
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venues:: Findings | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 19597–19607
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.1005/
DOI:
Bibkey:
Cite (ACL):: Hwiyeol Jo, Hyunwoo Lee, Kang Min Yoo, and Taiwoo Park. 2025. ZeroDL: Zero-shot Distribution Learning for Text Clustering via Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2025, pages 19597–19607, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: ZeroDL: Zero-shot Distribution Learning for Text Clustering via Large Language Models (Jo et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.1005.pdf

PDF Cite Search Fix data