Tue Le

2025

Recent advances in neural topic models (NTMs) have improved topic quality but still face challenges: weak document-topic alignment, high inference costs due to large pretrained language models (PLMs), and limited modeling of hierarchical topic structures. To address these issues, we introduce HiCOT (Hierarchical Clustering and Contrastive Learning with Optimal Transport for Neural Topic Modeling), a novel framework that enhances topic coherence and efficiency. HiCOT integrates Optimal Transport to refine document-topic relationships using compact PLM-based embeddings, captures semantic structure of the documents. Additionally, it employs hierarchical clustering combine with contrastive learning to disentangle topic-word and topic-topic relationships, ensuring clearer structure and better coherence. Experimental results on multiple benchmark datasets demonstrate HiCOT’s superior effectiveness over existing NTMs in topic coherence, topic performance, representation quality, and computational efficiency.

pdf bib abs
Sharpness-Aware Minimization for Topic Models with High-Quality Document Representations
Tung Nguyen | Tue Le | Hoang Tran Vuong | Quang Duc Nguyen | Duc Anh Nguyen | Linh Ngo Van | Sang Dinh | Thien Huu Nguyen
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Recent advanced frameworks in topic models have significantly enhanced the performance compared to conventional probabilistic approaches. Such models, mostly constructed from neural network architecture together with other advanced techniques such as contextual embedding, optimal transport distance and pre-trained language model, etc. have effectively improved the topic quality and document topic distribution. Despite the improvements, these methods lack considerations of effective optimization for complex objective functions that contain log-likelihood and additional regularization terms. In this study, we propose to apply an efficient optimization method to improve the generalization and performance of topic models. Our approach explicitly considers the sharpness of the loss landscape during optimization, which forces the optimizer to choose directions in the parameter space that lead to flatter minima, in which the models are typically more stable and robust to small perturbations in the data. Additionally, we propose an effective strategy to select the flatness region for parameter optimization by leveraging the optimal transport distance between doc-topic distributions and doc-cluster proportions, which can effectively enhance document representation. Experimental results on popular benchmark datasets demonstrate that our method effectively improves the performance of baseline topic models.

Co-authors

Quang Duc Nguyen 1

Duc Anh Nguyen 1

Tu Vu 1

Venues

Fix author