Manh Do
2025
Topic Modeling for Short Texts via Optimal Transport-Based Clustering
Tu Vu
|
Manh Do
|
Tung Nguyen
|
Linh Ngo Van
|
Sang Dinh
|
Thien Huu Nguyen
Findings of the Association for Computational Linguistics: ACL 2025
Discovering topics and learning document representations in topic space are two crucial aspects of topic modeling, particularly in the short-text setting, where inferring topic proportions for individual documents is highly challenging. Despite significant progress in neural topic modeling, effectively distinguishing document representations as well as topic embeddings remains an open problem. In this paper, we propose a novel method called **En**hancing Global **C**lustering with **O**ptimal **T**ransport in Topic Modeling (EnCOT). Our approach utilizes an abstract global clusters concept to capture global information and then employs the Optimal Transport framework to align document representations in the topic space with global clusters, while also aligning global clusters with topics. This dual alignment not only enhances the separation of documents in the topic space but also facilitates learning of latent topics. Through extensive experiments, we demonstrate that our method outperforms state-of-the-art techniques in short-text topic modeling across commonly used metrics.