Explicit Bayesian Inference to Uncover the Latent Themes of Large Language Models

Raymond Li, Chuyuan Li, Gabriel Murray, Giuseppe Carenini


Abstract
Large language models (LLMs) have demonstrated impressive generative capabilities, yet their inner mechanisms remain largely opaque. In this work, we introduce a novel approach to interpret LLMs generation process through the lens of an explicit Bayesian framework by inferring latent topic variables via variational inference. Specifically, we leverage a variational autoencoder-based neural topic model to dynamically approximate the posterior distribution over the high-level latent topic variables at each generation step. By reconstructing the LLM’s next-token predictions through these latent topics and maintaining a regularized latent space, our method yields interpretable and diverse topic representations but also has the ability to effectively captures semantic shifts throughout the text. We validate our approach on multiple datasets, showing that our latent topics outperform state-of-the-art topic models on intrinsic measures of coherence and diversity. Furthermore, we demonstrate the utility of our approach in downstream applications by using the inferred topic distributions to retrieve relevant demonstration examples for in-context learning, resulting in significant gains on classification and summarization tasks.
Anthology ID:
2025.findings-acl.1123
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venues:
Findings | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
21819–21833
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.1123/
DOI:
Bibkey:
Cite (ACL):
Raymond Li, Chuyuan Li, Gabriel Murray, and Giuseppe Carenini. 2025. Explicit Bayesian Inference to Uncover the Latent Themes of Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2025, pages 21819–21833, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Explicit Bayesian Inference to Uncover the Latent Themes of Large Language Models (Li et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.1123.pdf