Hierarchical Topic Modeling via Contrastive Learning and Hyperbolic Embedding

Zhicheng Lin, HeGang Chen, Yuyin Lu, Yanghui Rao, Hao Xu, Hanjiang Lai


Abstract
Hierarchical topic modeling, which can mine implicit semantics in the corpus and automatically construct topic hierarchical relationships, has received considerable attention recently. However, the current hierarchical topic models are mainly based on Euclidean space, which cannot well retain the implicit hierarchical semantic information in the corpus, leading to irrational structure of the generated topics. On the other hand, the existing Generative Adversarial Network (GAN) based neural topic models perform satisfactorily, but they remain constrained by pattern collapse due to the discontinuity of latent space. To solve the above problems, with the hypothesis of hyperbolic space, we propose a novel GAN-based hierarchical topic model to mine high-quality topics by introducing contrastive learning to capture information from documents. Furthermore, the distinct tree-like property of hyperbolic space preserves the implicit hierarchical semantics of documents in topic embeddings, which are projected into the hyperbolic space. Finally, we use a multi-head self-attention mechanism to learn implicit hierarchical semantics of topics and mine topic structure information. Experiments on real-world corpora demonstrate the remarkable performance of our model on topic coherence and topic diversity, as well as the rationality of the topic hierarchy.
Anthology ID:
2024.lrec-main.712
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
8133–8143
Language:
URL:
https://aclanthology.org/2024.lrec-main.712
DOI:
Bibkey:
Cite (ACL):
Zhicheng Lin, HeGang Chen, Yuyin Lu, Yanghui Rao, Hao Xu, and Hanjiang Lai. 2024. Hierarchical Topic Modeling via Contrastive Learning and Hyperbolic Embedding. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 8133–8143, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Hierarchical Topic Modeling via Contrastive Learning and Hyperbolic Embedding (Lin et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.lrec-main.712.pdf