Minh Chu Xuan

2026

Dynamic topic models aim to reveal how themes emerge, evolve, and dissolve in time-stamped corpora, but existing approaches still face three major challenges: (i) encoders capture bag-of-words statistics but fail to align with the rich semantic priors of large pre-trained language models, (ii) temporal linkages are often modeled as rigid one-to-one chains, limiting the ability to track non-linear evolution such as topic splits or merges, and (iii) interpretability remains shallow, relying on noisy top-word lists that obscure thematic clarity. We propose L-DNTM (LLM-Augmented for Dynamic Neural Topic Model), a variational framework designed to capture more faithful temporal trajectories. Our model integrates three key components: multi-objective distillation to inject PLM-derived semantic knowledge into the encoder, entropy-regularized optimal transport to align entire topic constellations across time for smooth yet flexible evolution, and LLM-guided refinement to sharpen topic–word distributions for improved interpretability. Extensive experiments on diverse corpora show that L-DNTM yields more coherent, temporally consistent, and interpretable topic dynamics, and further enhances downstream classification and clustering tasks.

pdf bib abs

LLM-XTM: Enhancing Cross-Lingual Topic Models with Large Language Models
Minh Chu Xuan | Tien-Phat Nguyen | Linh Ngo Van | Dinh Viet Sang | Nguyen Thi Ngoc Diep | Trung Le
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Cross-lingual topic modeling aims to discover shared semantic structures across languages, yet existing models depend on sparse bilingual resources and often yield incoherent or weakly aligned topics. Recent LLM-based refinements improve interpretability but are costly, document-level, and prone to hallucination, with prior white-box approaches requiring inaccessible token probabilities. We propose LLM-XTM, a framework that integrates LLM-guided topic refinement with self-consistency uncertainty quantification, enabling black-box, stable, and scalable enhancement of cross-lingual topic models. Experiments on multilingual corpora show that LLM-XTM achieves superior topic coherence and alignment while reducing reliance on bilingual dictionaries and expensive LLM calls.

Co-authors

Tien Phat Nguyen 1

Tung Nguyen 1

Ngo Van Dong 1

Venues

ACL1
Findings1

Fix author