Topic Modeling for Short Texts via Optimal Transport-Based Clustering

Tu Vu, Manh Do, Tung Nguyen, Linh Ngo Van, Sang Dinh, Thien Huu Nguyen


Abstract
Discovering topics and learning document representations in topic space are two crucial aspects of topic modeling, particularly in the short-text setting, where inferring topic proportions for individual documents is highly challenging. Despite significant progress in neural topic modeling, effectively distinguishing document representations as well as topic embeddings remains an open problem. In this paper, we propose a novel method called **En**hancing Global **C**lustering with **O**ptimal **T**ransport in Topic Modeling (EnCOT). Our approach utilizes an abstract global clusters concept to capture global information and then employs the Optimal Transport framework to align document representations in the topic space with global clusters, while also aligning global clusters with topics. This dual alignment not only enhances the separation of documents in the topic space but also facilitates learning of latent topics. Through extensive experiments, we demonstrate that our method outperforms state-of-the-art techniques in short-text topic modeling across commonly used metrics.
Anthology ID:
2025.findings-acl.398
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7666–7680
Language:
URL:
https://preview.aclanthology.org/display_plenaries/2025.findings-acl.398/
DOI:
Bibkey:
Cite (ACL):
Tu Vu, Manh Do, Tung Nguyen, Linh Ngo Van, Sang Dinh, and Thien Huu Nguyen. 2025. Topic Modeling for Short Texts via Optimal Transport-Based Clustering. In Findings of the Association for Computational Linguistics: ACL 2025, pages 7666–7680, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Topic Modeling for Short Texts via Optimal Transport-Based Clustering (Vu et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/display_plenaries/2025.findings-acl.398.pdf