Abstract
Probabilistic topic models such as latent Dirichlet allocation (LDA) are popularly used with Bayesian inference methods such as Gibbs sampling to learn posterior distributions over topic model parameters. We derive a novel measure of LDA topic quality using the variability of the posterior distributions. Compared to several existing baselines for automatic topic evaluation, the proposed metric achieves state-of-the-art correlations with human judgments of topic quality in experiments on three corpora. We additionally demonstrate that topic quality estimation can be further improved using a supervised estimator that combines multiple metrics.- Anthology ID:
- D19-1349
- Volume:
- Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
- Month:
- November
- Year:
- 2019
- Address:
- Hong Kong, China
- Venues:
- EMNLP | IJCNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3471–3477
- Language:
- URL:
- https://aclanthology.org/D19-1349
- DOI:
- 10.18653/v1/D19-1349
- Cite (ACL):
- Linzi Xing, Michael J. Paul, and Giuseppe Carenini. 2019. Evaluating Topic Quality with Posterior Variability. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3471–3477, Hong Kong, China. Association for Computational Linguistics.
- Cite (Informal):
- Evaluating Topic Quality with Posterior Variability (Xing et al., EMNLP-IJCNLP 2019)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/D19-1349.pdf
- Code
- lxing532/topic_variability